Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Does Hadoop 1.0.4 provide a durable sync for HBase-0.94.6?


Copy link to this message
-
Re: Does Hadoop 1.0.4 provide a durable sync for HBase-0.94.6?
Hi,

HDFS has two interfaces for durability: hflush and hsync:

Hflush() : Flush the data packet down the datanode pipeline. Wait for
ack’s.
Hsync() : Flush the data packet down the pipeline. Have datanodes execute
FSYNC equivalent. Wait for ack’s.

There is some work on adding a Durability API in HBase: see HBASE-7801 and
HBASE-8375.

However, as Stack mentioned, without HBASE-5954 is fixed, HBase right now
cannot make use of the hsync() API. I want to rebase the patch in
HBASE-5954, but it might take some more time.

The good news is that although not perfect, hflush, which is current
default makes sure that the update is send to 3 replicas, so unless there
is a data center power failure or similar, the data will make into the
disks pretty quickly.

Hope this helps.
Enis
On Tue, May 28, 2013 at 9:53 AM, Stack <[EMAIL PROTECTED]> wrote:

> On Tue, May 28, 2013 at 7:09 AM, jingguo yao <[EMAIL PROTECTED]> wrote:
>
> > Section 2.1.3 says that Hadoop 1.0.4 works with HBase-0.94.x [1]. And
> > Section 2.1.3.3 says that 1.0.4 has a working durable sync. But when I
> > check the source code of DFSClient.DFSOutputStream's sync method, I
> > finds the following javadoc:
> >
> >     /**
> >      * All data is written out to datanodes. It is not guaranteed
> >      * that data has been flushed to persistent store on the
> >      * datanode. Block allocations are persisted on namenode.
> >      */
> >
> > So it seems that sync does not support a durable sync. It contradicts
> > with [1].
> >
> > Can anybody help me on this confusion? Thanks.
>
>
>
> This issue is probably the best source for the state of sync in hbase (and
> hdfs): https://issues.apache.org/jira/browse/HBASE-5954
>
> In short, the refguide is misleading -- let me fix -- as 1.0.4 indeed has a
> sync but it is just a sync to the memory of three datanodes, not a true
> fsync out to disk.  The above cited issue is tracking issues that our Lars
> and other have contributed to HDFS to add fsync support.
>
> Yours,
> St.Ack
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB