-Re: data loss after cluster wide power loss
Azuryy Yu 2013-07-01, 23:52
how to enable "sync on block close" in HDFS?
--Send from my Sony mobile.
On Jul 2, 2013 6:47 AM, "Lars Hofhansl" <[EMAIL PROTECTED]> wrote:
> HBase is interesting here, because it rewrites old data into new files. So
> a power outage by default would not just lose new data but potentially old
> data as well.
> You can enable "sync on block close" in HDFS, and then at least be sure
> that closed blocks (and thus files) are synced to disk physically.
> I found that if that is paired with the "sync behind write" fadvice hint
> there performance impact is minimal.
> -- Lars
> Dave Latham <[EMAIL PROTECTED]> wrote:
> >Thanks for the response, Suresh.
> >I'm not sure that I understand the details properly. From my reading of
> >HDFS-744 the hsync API would allow a client to make sure that at any point
> >in time it's writes so far hit the disk. For example, for HBase it could
> >apply a fsync after adding some edits to its WAL to ensure those edits are
> >fully durable for a file which is still open.
> >However, in this case the dfs file was closed and even renamed. Is it the
> >case that even after a dfs file is closed and renamed that the data blocks
> >would still not be synced and would still be stored by the datanode in
> >"blocksBeingWritten" rather than in "current"? If that is case, would it
> >be better for the NameNode not to reject replicas that are in
> >blocksBeingWritten, especially if it doesn't have any other replicas
> >On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas <[EMAIL PROTECTED]
> >> Yes this is a known issue.
> >> The HDFS part of this was addressed in
> >> https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is
> >> available in 1.x release. I think HBase does not use this API yet.
> >> On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham <[EMAIL PROTECTED]>
> >> > We're running HBase over HDFS 1.0.2 on about 1000 nodes. On Saturday
> >> > data center we were in had a total power failure and the cluster went
> >> down
> >> > hard. When we brought it back up, HDFS reported 4 files as CORRUPT.
> >> > recovered the data in question from our secondary datacenter, but I'm
> >> > trying to understand what happened and whether this is a bug in HDFS
> >> > should be fixed.
> >> >
> >> > From what I can tell the file was created and closed by the dfs client
> >> > (hbase). Then HBase renamed it into a new directory and deleted some
> >> other
> >> > files containing the same data. Then the cluster lost power. After
> >> > cluster was restarted, the datanodes reported into the namenode but
> >> > blocks for this file appeared as "blocks being written" - the namenode
> >> > rejected them and the datanodes deleted the blocks. At this point
> >> > were no replicas for the blocks and the files were marked CORRUPT.
> >> > underlying file systems are ext3. Some questions that I would love
> >> > answers for if anyone with deeper understanding of HDFS can chime in:
> >> >
> >> > - Is this a known scenario where data loss is expected? (I found
> >> > HDFS-1539 but that seems different)
> >> > - When are blocks moved from blocksBeingWritten to current? Does
> >> > happen before a file close operation is acknowledged to a hdfs client?
> >> > - Could it be that the DataNodes actually moved the blocks to current
> >> but
> >> > after the restart ext3 rewound state somehow (forgive my ignorance of
> >> > underlying file system behavior)?
> >> > - Is there any other explanation for how this can happen?
> >> >
> >> > Here is a sequence of selected relevant log lines from the RS (HBase
> >> > Region Server) NN (NameNode) and DN (DataNode - 1 example of 3 in
> >> > question). It includes everything that mentions the block in
> question in
> >> > the NameNode and one DataNode log. Please let me know if this more
> >> > information that would be helpful.