Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Calling o/s.flush() in HLog.sync()?


Copy link to this message
-
Re: Calling o/s.flush() in HLog.sync()?
Jonathan Hsieh 2013-11-15, 07:37
I find that this answer was unsatisfying and could use some elaboration.

I think its there because of java OutputStream convention and not so much
because of hadoop.

The output object in ProtobufLogWriter is a HDFS FSDataOutputStream.  The
HFDS FSDataOutputStream essentially wraps a java OutputStream [1] (which
has write byte[] and write int methods only) providing a Java
DataOutputStream [2] object which provides nice writeXxxx methods for
serializing primitive datatypes (int, float etc).  For efficiency, usually
you'd wrap the OutputStream with a BufferedOutputStream[3] which adds an in
memory buffer and flushes to the underlaying outputstream when a  certain
size is reach or flush is called().

Since it gets it from the FS object I bet it could it could have different
implementations other than just the DFSOutputStream you saw -- which
require the flush.

Jon.

[1] http://docs.oracle.com/javase/7/docs/api/java/io/OutputStream.html
[2] http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html
[3]
http://docs.oracle.com/javase/7/docs/api/java/io/BufferedOutputStream.html
On Thu, Nov 7, 2013 at 2:56 PM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Himanshu:
> See
>
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/DataOutputStream.java#DataOutputStream.flush%28%29
> The flush() call results in OutputStream.flush().
>
> Cheers
>
>
> On Mon, Nov 4, 2013 at 9:11 PM, Himanshu Vashishtha <[EMAIL PROTECTED]
> >wrote:
>
> > Looking at ProtobufLogWriter class, it looks like the call to flush() in
> > the sync method is a noop.
> >
> >
> >
> https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java#L134
> >
> > The underlying output stream is DFSOutputStream, which doesn't implement
> > flush().
> >
> > And, it calls sync() anyway, which ensures the data is written to DN's
> > (cache).
> >
> > Previously with SequenceFile$Writer, it writes data to the outputstream
> > (using Writables#write), and invoke sync/hflush.
> >
> >
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java#L1314
> >
> > Is there a reason we have this call here? Please let me know if I miss
> any
> > context.
> >
> > Thanks,
> > Himanshu
> >
>

--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]