Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Sync and Data Replication

Copy link to this message
Re: Sync and Data Replication
On Sun, Jun 10, 2012 at 9:39 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Mohit,
> On Sat, Jun 9, 2012 at 11:11 PM, Mohit Anchlia <[EMAIL PROTECTED]>
> wrote:
> > Thanks Harsh for detailed info. It clears things up. Only thing from
> those
> > page is concerning is what happens when client crashes. It says you could
> > lose upto a block worth of information. Is this still true given that NN
> > would auto close the file?
> Where does it say this exactly? It is true that immediate readers will
> not get the last block (as it remains open and uncommitted), but once
> the lease recovery kicks in the file is closed successfully and the
> last block is indeed made available, so there's no 'data loss'.

I saw it in "Coherency Model" -> "consequences of application design"

Thanks for the information. It at least helps me in that I don't have to
worry about the data loss when sync is not closed.

> > Is it a good practice to reduce NN default value so that it auto-closes
> > before 1 hr.
> I've not seen people do this/need to do this. Most don't run into such
> a situation and it is vital to properly close() files or sync() on
> file streams before making it available to readers. HBase manages open
> files during WAL-recovery using lightweight recoverLease APIs that
> were added for its benefit, so it doesn't need to wait for an hour for
> WALs to close and recover data.
> --
> Harsh J