Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> hadoop 0.20 append - some clarifications


Copy link to this message
-
Re: hadoop 0.20 append - some clarifications
It is a bit confusing.

SequenceFile.Writer#sync isn't really sync.

There is SequenceFile.Writer#syncFs which is more what you might expect to
be sync.

Then there is HADOOP-6313 which specifies hflush and hsync.  Generally, if
you want portable code, you have to reflect a bit to figure out what can be
done.

On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <[EMAIL PROTECTED]> wrote:

>  Thanks Ted for clarifying.
>
> So the *sync* is to just flush the current buffers to datanode and persist
> the block info in namenode once per block, isn't it?
>
>
>
> Regarding reader able to see the unflushed data, I faced an issue in the
> following scneario:
>
> 1. a writer is writing a *10MB* file(block size 2 MB)
>
> 2. wrote the file upto 4MB (2 finalized blocks in *current* and nothing in
> *blocksBeingWritten* directory in DN) . So 2 blocks are written
>
> 3. client calls addBlock for the 3rd block on namenode and not yet created
> outputstream to DN(or written anything to DN). At this point of time, the
> namenode knows about the 3rd block but the datanode doesn't.
>
> 4. at point 3, a reader is trying to read the file and he is getting
> exception and not able to read the file as the datanode's getBlockInfo
> returns null to the client(of course DN doesn't know about the 3rd block
> yet)
>
> In this situation the reader cannot see the file. But when the block
> writing is in progress , the read is successful.
>
> *Is this a bug that needs to be handled in append branch?*
>
>
>
> >> -----Original Message-----
> >> From: Konstantin Boudnik [mailto:[EMAIL PROTECTED]]
> >> Sent: Friday, February 11, 2011 4:09 AM
> >>To: [EMAIL PROTECTED]
> >> Subject: Re: hadoop 0.20 append - some clarifications
>
> >> You might also want to check append design doc published at HDFS-265
>
>
>
> I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's
> design doc won't apply to it.
>
>
>  ------------------------------
>
> *From:* Ted Dunning [mailto:[EMAIL PROTECTED]]
> *Sent:* Thursday, February 10, 2011 9:29 PM
> *To:* [EMAIL PROTECTED]; [EMAIL PROTECTED]
> *Cc:* [EMAIL PROTECTED]
> *Subject:* Re: hadoop 0.20 append - some clarifications
>
>
>
> Correct is a strong word here.
>
>
>
> There is actually an HDFS unit test that checks to see if partially written
> and unflushed data is visible.  The basic rule of thumb is that you need to
> synchronize readers and writers outside of HDFS.  There is no guarantee that
> data is visible or invisible after writing, but there is a guarantee that it
> will become visible after sync or close.
>
> On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote:
>
> Is this the correct behavior or my understanding is wrong?
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB