I find that this answer was unsatisfying and could use some elaboration.
I think its there because of java OutputStream convention and not so much
because of hadoop.
The output object in ProtobufLogWriter is a HDFS FSDataOutputStream. The
HFDS FSDataOutputStream essentially wraps a java OutputStream  (which
has write byte and write int methods only) providing a Java
DataOutputStream  object which provides nice writeXxxx methods for
serializing primitive datatypes (int, float etc). For efficiency, usually
you'd wrap the OutputStream with a BufferedOutputStream which adds an in
memory buffer and flushes to the underlaying outputstream when a certain
size is reach or flush is called().
Since it gets it from the FS object I bet it could it could have different
implementations other than just the DFSOutputStream you saw -- which
require the flush.
On Thu, Nov 7, 2013 at 2:56 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> The flush() call results in OutputStream.flush().
> On Mon, Nov 4, 2013 at 9:11 PM, Himanshu Vashishtha <[EMAIL PROTECTED]
> > Looking at ProtobufLogWriter class, it looks like the call to flush() in
> > the sync method is a noop.
> > The underlying output stream is DFSOutputStream, which doesn't implement
> > flush().
> > And, it calls sync() anyway, which ensures the data is written to DN's
> > (cache).
> > Previously with SequenceFile$Writer, it writes data to the outputstream
> > (using Writables#write), and invoke sync/hflush.
> > Is there a reason we have this call here? Please let me know if I miss
> > context.
> > Thanks,
> > Himanshu
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]