Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - hadoop 0.20 append - some clarifications


Copy link to this message
-
Re: hadoop 0.20 append - some clarifications
Ted Dunning 2011-02-11, 05:33
It is a bit confusing.

SequenceFile.Writer#sync isn't really sync.

There is SequenceFile.Writer#syncFs which is more what you might expect to
be sync.

Then there is HADOOP-6313 which specifies hflush and hsync.  Generally, if
you want portable code, you have to reflect a bit to figure out what can be
done.

On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <[EMAIL PROTECTED]> wrote:

>  Thanks Ted for clarifying.
>
> So the *sync* is to just flush the current buffers to datanode and persist
> the block info in namenode once per block, isn't it?
>
>
>
> Regarding reader able to see the unflushed data, I faced an issue in the
> following scneario:
>
> 1. a writer is writing a *10MB* file(block size 2 MB)
>
> 2. wrote the file upto 4MB (2 finalized blocks in *current* and nothing in
> *blocksBeingWritten* directory in DN) . So 2 blocks are written
>
> 3. client calls addBlock for the 3rd block on namenode and not yet created
> outputstream to DN(or written anything to DN). At this point of time, the
> namenode knows about the 3rd block but the datanode doesn't.
>
> 4. at point 3, a reader is trying to read the file and he is getting
> exception and not able to read the file as the datanode's getBlockInfo
> returns null to the client(of course DN doesn't know about the 3rd block
> yet)
>
> In this situation the reader cannot see the file. But when the block
> writing is in progress , the read is successful.
>
> *Is this a bug that needs to be handled in append branch?*
>
>
>
> >> -----Original Message-----
> >> From: Konstantin Boudnik [mailto:[EMAIL PROTECTED]]
> >> Sent: Friday, February 11, 2011 4:09 AM
> >>To: [EMAIL PROTECTED]
> >> Subject: Re: hadoop 0.20 append - some clarifications
>
> >> You might also want to check append design doc published at HDFS-265
>
>
>
> I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's
> design doc won't apply to it.
>
>
>  ------------------------------
>
> *From:* Ted Dunning [mailto:[EMAIL PROTECTED]]
> *Sent:* Thursday, February 10, 2011 9:29 PM
> *To:* [EMAIL PROTECTED]; [EMAIL PROTECTED]
> *Cc:* [EMAIL PROTECTED]
> *Subject:* Re: hadoop 0.20 append - some clarifications
>
>
>
> Correct is a strong word here.
>
>
>
> There is actually an HDFS unit test that checks to see if partially written
> and unflushed data is visible.  The basic rule of thumb is that you need to
> synchronize readers and writers outside of HDFS.  There is no guarantee that
> data is visible or invisible after writing, but there is a guarantee that it
> will become visible after sync or close.
>
> On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <[EMAIL PROTECTED]> wrote:
>
> Is this the correct behavior or my understanding is wrong?
>
>
>