Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - data loss after cluster wide power loss


Copy link to this message
-
data loss after cluster wide power loss
Dave Latham 2013-07-01, 22:00
We're running HBase over HDFS 1.0.2 on about 1000 nodes.  On Saturday the
data center we were in had a total power failure and the cluster went down
hard.  When we brought it back up, HDFS reported 4 files as CORRUPT.  We
recovered the data in question from our secondary datacenter, but I'm
trying to understand what happened and whether this is a bug in HDFS that
should be fixed.

>From what I can tell the file was created and closed by the dfs client
(hbase).  Then HBase renamed it into a new directory and deleted some other
files containing the same data.  Then the cluster lost power.  After the
cluster was restarted, the datanodes reported into the namenode but the
blocks for this file appeared as "blocks being written" - the namenode
rejected them and the datanodes deleted the blocks.  At this point there
were no replicas for the blocks and the files were marked CORRUPT.  The
underlying file systems are ext3.  Some questions that I would love get
answers for if anyone with deeper understanding of HDFS can chime in:

 - Is this a known scenario where data loss is expected?  (I found
HDFS-1539 but that seems different)
 - When are blocks moved from blocksBeingWritten to current?  Does that
happen before a file close operation is acknowledged to a hdfs client?
 - Could it be that the DataNodes actually moved the blocks to current but
after the restart ext3 rewound state somehow (forgive my ignorance of
underlying file system behavior)?
 - Is there any other explanation for how this can happen?

Here is a sequence of selected relevant log lines from the RS (HBase Region
Server) NN (NameNode) and DN (DataNode - 1 example of 3 in question).  It
includes everything that mentions the block in question in the NameNode and
one DataNode log.  Please let me know if this more information that would
be helpful.

RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils:
Creating
file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
with permission=rwxrwxrwx
NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
blk_1395839728632046111_357084589
DN 2013-06-29 11:16:06,832 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: /
10.0.5.237:50010
NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to
blk_1395839728632046111_357084589 size 25418340
NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to
blk_1395839728632046111_357084589 size 25418340
NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to
blk_1395839728632046111_357084589 size 25418340
DN 2013-06-29 11:16:11,385 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
DN 2013-06-29 11:16:11,385 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
block blk_1395839728632046111_357084589 terminating
NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange:
Removing lease on  file
/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
from client DFSClient_hb_rs_hs745,60020,1372470111932
NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.completeFile: file
/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
is closed by DFSClient_hb_rs_hs745,60020,1372470111932
RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store:
Renaming compacted file at
hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
to
hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store:
Completed major compaction of 7 file(s) in n of
users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into
6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: addStoredBlock request received for
blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but was
rejected: Reported as block being written but is a block of closed file.
NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet
of 10.0.6.1:50010
NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: addStoredBlock request received for
blk_1395839728632046111_357084589 on 10.0.5.237:50010 size 16971264 but was
rejected: Reported as block being written but is a block of closed file.
NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet
of 10.0.5.237:50010
NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: addStoredBlock request received for
blk_1395839728632046111_357084589 on 10.0.6.24:50010 size 21913088 but was
rejected: Reported as block being written but is a block of closed file.
NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet
of 10.0.6.24:50010
(note the clock on the server running DN is wrong after restart.  I believe
timestamps are off by 6 hours:)
DN 2013-06-29 06:07:22,877 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block
blk_1395