Thanks Harsh for detailed info. It clears things up. Only thing from those
page is concerning is what happens when client crashes. It says you could
lose upto a block worth of information. Is this still true given that NN
would auto close the file?
Is it a good practice to reduce NN default value so that it auto-closes
before 1 hr.
Regarding OS cache, I think it should be ok since chances of loosing
replica nodes all at the same time is low.
On Sat, Jun 9, 2012 at 5:13 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hi Mohit,
> > In this scenario is data also replicated as defined by the replication
> factor to other nodes as well? I am wondering if at this point if crash
> occurs do I have data in other nodes?
> What kind of crash are you talking about here? A client crash or a
> cluster crash? If a cluster, is the loss you're thinking of one DN or
> all the replicating DNs?
> If client fails to close a file due to a crash, it is auto-closed
> later (default is one hour) by the NameNode and whatever the client
> successfully wrote (i.e. into its last block) is then made available
> to readers at that point. If the client synced, then its last sync
> point is always available to readers and whatever it didn't sync is
> made available when the file is closed later by the NN. For DN
> failures, read on.
> Replication in 1.x/0.20.x is done via pipelines. Its done regardless
> of sync() calls. All write packets are indeed sent to and acknowledged
> by each DN in the constructed pipeline as the write progresses. For a
> good diagram on the sequence here, see Figure 3.3 | Page 66 | Chapter
> 3: The Hadoop Distributed Filesystem, in Tom's "Hadoop: The Definitive
> Guide" (2nd ed. page nos. Gotta get 3rd ed. soon :))
> The sync behavior is further explained under the 'Coherency Model'
> title at Page 68 | Chapter 3: The Hadoop Distributed Filesystem of the
> same book. Think of sync() more as a checkpoint done over the write
> pipeline, such that new readers can read the length of synced bytes
> immediately and that they are guaranteed to be outside of the DN
> application (JVM) buffers (i.e. flushed).
> Some further notes, for general info: In 0.20.x/1.x releases, there's
> no hard-guarantee that the write buffer flushing done via sync ensures
> the data went to the *disk*. It may remain in the OS buffers (a
> feature in OSes, for performance). This is cause we do not do an
> fsync() (i.e. calling force on the FileChannel for the block and
> metadata outputs), but rather just an output stream flush. In the
> future, via 2.0.1-alpha release (soon to come at this point) and
> onwards, the specific call hsync() will ensure that this is not the
> However, if you are OK with the OS buffers feature/caveat and
> primarily need syncing not for reliability but for readers, you may
> use the call hflush() and save on performance. One place where hsync()
> is to be preferred instead of hflush() is where you use WALs (for data
> reliability), and HBase is one such application. With hsync(), HBase
> can survive potential failures caused by major power failure cases
> (among others).
> Let us know if this clears it up for you!
> On Sat, Jun 9, 2012 at 4:58 AM, Mohit Anchlia <[EMAIL PROTECTED]>
> > I am wondering the role of sync in replication of data to other nodes.
> > client writes a line to a file in Hadoop, at this point file handle is
> > and sync has not been called. In this scenario is data also replicated as
> > defined by the replication factor to other nodes as well? I am wondering
> > at this point if crash occurs do I have data in other nodes?
> Harsh J