|
|
-
Sync and Data Replication
Mohit Anchlia 2012-06-08, 23:28
I am wondering the role of sync in replication of data to other nodes. Say client writes a line to a file in Hadoop, at this point file handle is open and sync has not been called. In this scenario is data also replicated as defined by the replication factor to other nodes as well? I am wondering if at this point if crash occurs do I have data in other nodes?
-
Re: Sync and Data Replication
Harsh J 2012-06-09, 12:13
Hi Mohit,
> In this scenario is data also replicated as defined by the replication factor to other nodes as well? I am wondering if at this point if crash occurs do I have data in other nodes?
What kind of crash are you talking about here? A client crash or a cluster crash? If a cluster, is the loss you're thinking of one DN or all the replicating DNs?
If client fails to close a file due to a crash, it is auto-closed later (default is one hour) by the NameNode and whatever the client successfully wrote (i.e. into its last block) is then made available to readers at that point. If the client synced, then its last sync point is always available to readers and whatever it didn't sync is made available when the file is closed later by the NN. For DN failures, read on.
Replication in 1.x/0.20.x is done via pipelines. Its done regardless of sync() calls. All write packets are indeed sent to and acknowledged by each DN in the constructed pipeline as the write progresses. For a good diagram on the sequence here, see Figure 3.3 | Page 66 | Chapter 3: The Hadoop Distributed Filesystem, in Tom's "Hadoop: The Definitive Guide" (2nd ed. page nos. Gotta get 3rd ed. soon :))
The sync behavior is further explained under the 'Coherency Model' title at Page 68 | Chapter 3: The Hadoop Distributed Filesystem of the same book. Think of sync() more as a checkpoint done over the write pipeline, such that new readers can read the length of synced bytes immediately and that they are guaranteed to be outside of the DN application (JVM) buffers (i.e. flushed).
Some further notes, for general info: In 0.20.x/1.x releases, there's no hard-guarantee that the write buffer flushing done via sync ensures the data went to the *disk*. It may remain in the OS buffers (a feature in OSes, for performance). This is cause we do not do an fsync() (i.e. calling force on the FileChannel for the block and metadata outputs), but rather just an output stream flush. In the future, via 2.0.1-alpha release (soon to come at this point) and onwards, the specific call hsync() will ensure that this is not the case.
However, if you are OK with the OS buffers feature/caveat and primarily need syncing not for reliability but for readers, you may use the call hflush() and save on performance. One place where hsync() is to be preferred instead of hflush() is where you use WALs (for data reliability), and HBase is one such application. With hsync(), HBase can survive potential failures caused by major power failure cases (among others).
Let us know if this clears it up for you!
On Sat, Jun 9, 2012 at 4:58 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > I am wondering the role of sync in replication of data to other nodes. Say > client writes a line to a file in Hadoop, at this point file handle is open > and sync has not been called. In this scenario is data also replicated as > defined by the replication factor to other nodes as well? I am wondering if > at this point if crash occurs do I have data in other nodes?
-- Harsh J
-
Re: Sync and Data Replication
Mohit Anchlia 2012-06-09, 17:41
Thanks Harsh for detailed info. It clears things up. Only thing from those page is concerning is what happens when client crashes. It says you could lose upto a block worth of information. Is this still true given that NN would auto close the file?
Is it a good practice to reduce NN default value so that it auto-closes before 1 hr.
Regarding OS cache, I think it should be ok since chances of loosing replica nodes all at the same time is low. On Sat, Jun 9, 2012 at 5:13 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hi Mohit, > > > In this scenario is data also replicated as defined by the replication > factor to other nodes as well? I am wondering if at this point if crash > occurs do I have data in other nodes? > > What kind of crash are you talking about here? A client crash or a > cluster crash? If a cluster, is the loss you're thinking of one DN or > all the replicating DNs? > > If client fails to close a file due to a crash, it is auto-closed > later (default is one hour) by the NameNode and whatever the client > successfully wrote (i.e. into its last block) is then made available > to readers at that point. If the client synced, then its last sync > point is always available to readers and whatever it didn't sync is > made available when the file is closed later by the NN. For DN > failures, read on. > > Replication in 1.x/0.20.x is done via pipelines. Its done regardless > of sync() calls. All write packets are indeed sent to and acknowledged > by each DN in the constructed pipeline as the write progresses. For a > good diagram on the sequence here, see Figure 3.3 | Page 66 | Chapter > 3: The Hadoop Distributed Filesystem, in Tom's "Hadoop: The Definitive > Guide" (2nd ed. page nos. Gotta get 3rd ed. soon :)) > > The sync behavior is further explained under the 'Coherency Model' > title at Page 68 | Chapter 3: The Hadoop Distributed Filesystem of the > same book. Think of sync() more as a checkpoint done over the write > pipeline, such that new readers can read the length of synced bytes > immediately and that they are guaranteed to be outside of the DN > application (JVM) buffers (i.e. flushed). > > Some further notes, for general info: In 0.20.x/1.x releases, there's > no hard-guarantee that the write buffer flushing done via sync ensures > the data went to the *disk*. It may remain in the OS buffers (a > feature in OSes, for performance). This is cause we do not do an > fsync() (i.e. calling force on the FileChannel for the block and > metadata outputs), but rather just an output stream flush. In the > future, via 2.0.1-alpha release (soon to come at this point) and > onwards, the specific call hsync() will ensure that this is not the > case. > > However, if you are OK with the OS buffers feature/caveat and > primarily need syncing not for reliability but for readers, you may > use the call hflush() and save on performance. One place where hsync() > is to be preferred instead of hflush() is where you use WALs (for data > reliability), and HBase is one such application. With hsync(), HBase > can survive potential failures caused by major power failure cases > (among others). > > Let us know if this clears it up for you! > > On Sat, Jun 9, 2012 at 4:58 AM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > I am wondering the role of sync in replication of data to other nodes. > Say > > client writes a line to a file in Hadoop, at this point file handle is > open > > and sync has not been called. In this scenario is data also replicated as > > defined by the replication factor to other nodes as well? I am wondering > if > > at this point if crash occurs do I have data in other nodes? > > > > -- > Harsh J >
-
Re: Sync and Data Replication
Harsh J 2012-06-10, 16:39
Mohit,
On Sat, Jun 9, 2012 at 11:11 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > Thanks Harsh for detailed info. It clears things up. Only thing from those > page is concerning is what happens when client crashes. It says you could > lose upto a block worth of information. Is this still true given that NN > would auto close the file?
Where does it say this exactly? It is true that immediate readers will not get the last block (as it remains open and uncommitted), but once the lease recovery kicks in the file is closed successfully and the last block is indeed made available, so there's no 'data loss'.
> Is it a good practice to reduce NN default value so that it auto-closes > before 1 hr.
I've not seen people do this/need to do this. Most don't run into such a situation and it is vital to properly close() files or sync() on file streams before making it available to readers. HBase manages open files during WAL-recovery using lightweight recoverLease APIs that were added for its benefit, so it doesn't need to wait for an hour for WALs to close and recover data.
-- Harsh J
-
Re: Sync and Data Replication
Mohit Anchlia 2012-06-10, 19:17
On Sun, Jun 10, 2012 at 9:39 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> Mohit, > > On Sat, Jun 9, 2012 at 11:11 PM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > Thanks Harsh for detailed info. It clears things up. Only thing from > those > > page is concerning is what happens when client crashes. It says you could > > lose upto a block worth of information. Is this still true given that NN > > would auto close the file? > > Where does it say this exactly? It is true that immediate readers will > not get the last block (as it remains open and uncommitted), but once > the lease recovery kicks in the file is closed successfully and the > last block is indeed made available, so there's no 'data loss'. >
I saw it in "Coherency Model" -> "consequences of application design" paragraph.
Thanks for the information. It at least helps me in that I don't have to worry about the data loss when sync is not closed.
> > > Is it a good practice to reduce NN default value so that it auto-closes > > before 1 hr. > > I've not seen people do this/need to do this. Most don't run into such > a situation and it is vital to properly close() files or sync() on > file streams before making it available to readers. HBase manages open > files during WAL-recovery using lightweight recoverLease APIs that > were added for its benefit, so it doesn't need to wait for an hour for > WALs to close and recover data. > > -- > Harsh J >
|
|