Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> corrupt WAL and Java Heap Space...

Copy link to this message
Re: corrupt WAL and Java Heap Space...
We just hit the same issue.  I attached log snippets from the regionserver
and master into https://issues.apache.org/jira/browse/HBASE-4107

I was able to get the log file out of hdfs.  Is there a location I can put
it back in to have it picked up?


On Fri, Jul 15, 2011 at 12:23 PM, Andy Sautins

>  I don't have the log still.  Not sure what I was thinking deleting it.  I
> was a little too aggressive wanting to get my fsck back to having 0 corrupt
> blocks.
>  What you say is interesting.  It's more than possible that I'm
> misunderstanding what is going on.
>  What we saw with the log file is that we could cat it, but couldn't copy
> the file ( would complain about a bad checksum ).  I know that's not hard
> data, but going by that what you say about applying the log up until the
> last sync makes would make sense.  What might have thrown me is after a
> re-start the logs ( including the corrupt log ) were still in the .logs
> folder.  We did a full shutdown/restart and the following stacktrace was in
> the master logs. After this stacktrace hbase continued to startup, however
> the logs ( all logs up until the corrupt log ) for the region with the
> corrupt log file were left in the .logs directory.  When we removed the
> corrupt log file and re-started again all the existing logs were removed
> after successful restart as I would expect.
>   So is it more likely that the error on shutdown is reasonable and that
> the log cleanup just didn't happen on startup?  I suppose it makes sense not
> to remove them if there is an error, but it did throw me that the corrupt
> file as well as previous files were still in the .logs directory.
> 2011-07-14 18:07:45,954 ERROR
> org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting hdfs://
> hdnn.dfs.returnpath.net:8020/user/hbase/.logs/hd31.dfs.returnpath.net,60020,1309294522164
> org.apache.hadoop.fs.ChecksumException: Checksum error:
> /blk_-8148723766791273697:of:/user/hbase/.logs/hd31.dfs.returnpath.net
> ,60020,1309294522164/hd31.dfs.returnpath.net%3A60020.1310675410770 at
> 57790464
>        at
> org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
>        at
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
>        at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176)
>        at
> org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193)
>        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
>        at
> org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1249)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1899)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1951)
>        at java.io.DataInputStream.read(DataInputStream.java:132)
>        at java.io.DataInputStream.readFully(DataInputStream.java:178)
>        at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
>        at
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1945)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1845)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891)
>        at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:198)
>        at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:172)
>        at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.parseHLog(HLogSplitter.java:429)
>        at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:262)
>        at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:188)
>        at
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:197)