I think there is minimum performance degration if set
dfs.datanode.synconclose to true.
On Tue, Jul 2, 2013 at 3:31 PM, Uma Maheswara Rao G <[EMAIL PROTECTED]>wrote:
> Hi Dave,
> Looks like your analysis is correct. I have faced similar issue some time
> See the discussion link:
> On sudden restarts, it can lost the OS filesystem edits. Similar thing
> happened in our case, i.e, after restart blocks were moved back to
> BeingWritten directory even though they were finalized.
> After restart they were marked as corrupt. You could set
> dfs.datanode.synconclose to true to avoid this sort of things, but that
> will degrade performance.
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Dave
> Sent: 01 July 2013 16:08
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: data loss after cluster wide power loss
> Much appreciated, Suresh. Let me know if I can provide any more
> information or if you'd like me to open a JIRA.
> On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas <[EMAIL PROTECTED]
> > Dave,
> > Thanks for the detailed email. Sorry I did not read all the details
> > you had sent earlier completely (on my phone). As you said, this is
> > not related to data loss related to HBase log and hsync. I think you
> > are right; the rename operation itself might not have hit the disk. I
> > think we should either ensure metadata operation is synced on the
> > datanode or handle it being reported as blockBeingWritten. Let me
> > spend sometime to debug this issue.
> > One surprising thing is, all the replicas were reported as
> > blockBeingWritten.
> > Regards,
> > Suresh
> > On Mon, Jul 1, 2013 at 6:03 PM, Dave Latham <[EMAIL PROTECTED]> wrote:
> >> (Removing hbase list and adding hdfs-dev list as this is pretty
> >> internal stuff).
> >> Reading through the code a bit:
> >> FSDataOutputStream.close calls
> >> DFSOutputStream.close calls
> >> DFSOutputStream.closeInternal
> >> - sets currentPacket.lastPacketInBlock = true
> >> - then calls
> >> DFSOutputStream.flushInternal
> >> - enqueues current packet
> >> - waits for ack
> >> BlockReceiver.run
> >> - if (lastPacketInBlock && !receiver.finalized) calls
> >> FSDataset.finalizeBlock calls FSDataset.finalizeBlockInternal calls
> >> FSVolume.addBlock calls FSDir.addBlock calls FSDir.addBlock
> >> - renames block from "blocksBeingWritten" tmp dir to "current" dest
> >> dir
> >> This looks to me as I would expect a synchronous chain from a DFS
> >> client to moving the file from blocksBeingWritten to the current dir
> >> so that once the file is closed that it the block files would be in
> >> the proper directory
> >> - even if the contents of the file are still in the OS buffer rather
> >> than synced to disk. It's only after this moving of blocks that
> >> NameNode.complete file is called. There are several conditions and
> >> loops in there that I'm not certain this chain is fully reliable in
> >> all cases without a greater understanding of the code.
> >> Could it be the case that the rename operation itself is not synced
> >> and that ext3 lost the fact that the block files were moved?
> >> Or is there a bug in the close file logic that for some reason the
> >> block files are not always moved into place when a file is closed?
> >> Thanks for your patience,
> >> Dave
> >> On Mon, Jul 1, 2013 at 3:35 PM, Dave Latham <[EMAIL PROTECTED]>
> >>> Thanks for the response, Suresh.
> >>> I'm not sure that I understand the details properly. From my
> >>> reading of
> >>> HDFS-744 the hsync API would allow a client to make sure that at any
> >>> point in time it's writes so far hit the disk. For example, for
> >>> HBase it could apply a fsync after adding some edits to its WAL to