Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Does Hadoop 1.0.4 provide a durable sync for HBase-0.94.6?


Copy link to this message
-
Re: Does Hadoop 1.0.4 provide a durable sync for HBase-0.94.6?
Enis Söztutar 2013-05-28, 22:19
Hi,

HDFS has two interfaces for durability: hflush and hsync:

Hflush() : Flush the data packet down the datanode pipeline. Wait for
ack’s.
Hsync() : Flush the data packet down the pipeline. Have datanodes execute
FSYNC equivalent. Wait for ack’s.

There is some work on adding a Durability API in HBase: see HBASE-7801 and
HBASE-8375.

However, as Stack mentioned, without HBASE-5954 is fixed, HBase right now
cannot make use of the hsync() API. I want to rebase the patch in
HBASE-5954, but it might take some more time.

The good news is that although not perfect, hflush, which is current
default makes sure that the update is send to 3 replicas, so unless there
is a data center power failure or similar, the data will make into the
disks pretty quickly.

Hope this helps.
Enis
On Tue, May 28, 2013 at 9:53 AM, Stack <[EMAIL PROTECTED]> wrote:

> On Tue, May 28, 2013 at 7:09 AM, jingguo yao <[EMAIL PROTECTED]> wrote:
>
> > Section 2.1.3 says that Hadoop 1.0.4 works with HBase-0.94.x [1]. And
> > Section 2.1.3.3 says that 1.0.4 has a working durable sync. But when I
> > check the source code of DFSClient.DFSOutputStream's sync method, I
> > finds the following javadoc:
> >
> >     /**
> >      * All data is written out to datanodes. It is not guaranteed
> >      * that data has been flushed to persistent store on the
> >      * datanode. Block allocations are persisted on namenode.
> >      */
> >
> > So it seems that sync does not support a durable sync. It contradicts
> > with [1].
> >
> > Can anybody help me on this confusion? Thanks.
>
>
>
> This issue is probably the best source for the state of sync in hbase (and
> hdfs): https://issues.apache.org/jira/browse/HBASE-5954
>
> In short, the refguide is misleading -- let me fix -- as 1.0.4 indeed has a
> sync but it is just a sync to the memory of three datanodes, not a true
> fsync out to disk.  The above cited issue is tracking issues that our Lars
> and other have contributed to HDFS to add fsync support.
>
> Yours,
> St.Ack
>