Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Recovering corrupt HLog files

Copy link to this message
Recovering corrupt HLog files
Hello all,

In an AWS outtage we lost about a 5th of our regionservers, and about an
8th of our total datanodes.  Despite a replication factor of 3, it appears
we may have lost some data from corrupt HLogs.  Looking at my hmaster I see
messages like this:

12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog
Marking as corrupted

We are back to stable operating now, and in trying to research this I found
the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory.  There are 20
files listed there.

What are our options for tracking down and potentially recovering any data
that was lost.  Or how can we even tell what was lost, if any?  Does the
existence of these files pretty much guarantee data lost? There doesn't
seem to be much documentation on this.  From reading it seems like it might
be possible that part of each of these files was recovered.

Any help would be appreciated.