Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> sync on writes


+
Mohit Anchlia 2012-08-01, 01:09
+
Alex Baranau 2012-08-01, 13:16
+
Jerry Lam 2012-08-01, 14:10
+
lars hofhansl 2012-08-01, 16:29
Copy link to this message
-
Re: sync on writes
On Wed, Aug 1, 2012 at 9:29 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> "sync" is a fluffy term in HDFS. HDFS has hsync and hflush.
> hflush forces all current changes at a DFSClient to all replica nodes (but
> not to disk).
>
> Until HDFS-744 hsync would be identical to hflush. After HDFS-744 hsync
> can be used to force data to disk at the replicas.
>
>
> When HBase refers to "sync" the hflush semantics are meant (at least until
> HBASE-5954 is finished).
> I.e. a sync here ensures that the replica nodes have seen the changes,
> which is what you want.
>
>
> So when you say "since another copy is always there on the replica nodes",
> that is only guaranteed after an hflush (again, which HBase calls sync).
>
>
> I've also written about this here:
> http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-durable-sync.html
>
> -- Lars
>
>
>
Thanks this post is very helpful

>
> ________________________________
>  From: Mohit Anchlia <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Tuesday, July 31, 2012 6:09 PM
> Subject: sync on writes
>
> In the HBase book it mentioned that the default behaviour of write is to
> call sync on each node before sending replica copies to the nodes in the
> pipeline. Is there a reason this was kept default because if data is
> getting written on multiple nodes then likelyhood of losing data is really
> low since another copy is always there on the replica nodes. Is it ok to
> make this sync async and is it advisable?
>