Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Differences between hflush & hsync()


Copy link to this message
-
Re: Differences between hflush & hsync()
Harsh J 2012-04-12, 12:43
In Hadoop 1.0 (from 0.20-append), there's just a single "sync(…)"
output-stream call that does a metadata update to persist the data
already written to the under-construction blocks and flushes the open
file for the block at the DNs (but does *not* flush the file
descriptor of the file at the OS level, via
http://linux.die.net/man/2/fsync).

In Hadoop 2.0 (what is 0.23.x today), there are two APIs - hflush and
hsync. The former would be akin to the above and old sync(…) call,
while the latter is designed to do one step further and call the fsync
syscall (http://linux.die.net/man/2/fsync) to ensure that the data is
really persisted. However, currently, as of 0.23.2 at least, hsync()
isn't completely implemented, and just calls hflush() instead so the
behavior is the same.

See https://issues.apache.org/jira/browse/HDFS-265 for all the
discussion around this change between 1.0 and 2.0.

Also see the API docs for both here for their javadocs:
http://hadoop.apache.org/common/docs/r0.23.1/api/org/apache/hadoop/fs/FSDataOutputStream.html#hflush()
and http://hadoop.apache.org/common/docs/r0.23.1/api/org/apache/hadoop/fs/FSDataOutputStream.html#hsync()

Ticket https://issues.apache.org/jira/browse/HDFS-744 tracks
completion of the hsync() feature.

On Thu, Apr 12, 2012 at 3:59 PM, Inder Pall <[EMAIL PROTECTED]> wrote:
> Folks,
>
> Can some one shed out more technical details than what the javadoc talks
> about.
> Also, which one should be used when?
>
> --
> Thanks,
> - Inder
>   Tech Platforms @Inmobi
>   Linkedin - http://goo.gl/eR4Ub

--
Harsh J