Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Calling o/s.flush() in HLog.sync()?


Copy link to this message
-
Re: Calling o/s.flush() in HLog.sync()?
I find that this answer was unsatisfying and could use some elaboration.

I think its there because of java OutputStream convention and not so much
because of hadoop.

The output object in ProtobufLogWriter is a HDFS FSDataOutputStream.  The
HFDS FSDataOutputStream essentially wraps a java OutputStream [1] (which
has write byte[] and write int methods only) providing a Java
DataOutputStream [2] object which provides nice writeXxxx methods for
serializing primitive datatypes (int, float etc).  For efficiency, usually
you'd wrap the OutputStream with a BufferedOutputStream[3] which adds an in
memory buffer and flushes to the underlaying outputstream when a  certain
size is reach or flush is called().

Since it gets it from the FS object I bet it could it could have different
implementations other than just the DFSOutputStream you saw -- which
require the flush.

Jon.

[1] http://docs.oracle.com/javase/7/docs/api/java/io/OutputStream.html
[2] http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html
[3]
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedOutputStream.html
On Thu, Nov 7, 2013 at 2:56 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Himanshu:
> See
>
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/DataOutputStream.java#DataOutputStream.flush%28%29
> The flush() call results in OutputStream.flush().
>
> Cheers
>
>
> On Mon, Nov 4, 2013 at 9:11 PM, Himanshu Vashishtha <[EMAIL PROTECTED]
> >wrote:
>
> > Looking at ProtobufLogWriter class, it looks like the call to flush() in
> > the sync method is a noop.
> >
> >
> >
> https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L134
> >
> > The underlying output stream is DFSOutputStream, which doesn't implement
> > flush().
> >
> > And, it calls sync() anyway, which ensures the data is written to DN's
> > (cache).
> >
> > Previously with SequenceFile$Writer, it writes data to the outputstream
> > (using Writables#write), and invoke sync/hflush.
> >
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java#L1314
> >
> > Is there a reason we have this call here? Please let me know if I miss
> any
> > context.
> >
> > Thanks,
> > Himanshu
> >
>

--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB