|
|
-
Re: corrupt WAL and Java Heap Space...Dave Latham 2011-08-26, 16:44
We just hit the same issue. I attached log snippets from the regionserver
and master into https://issues.apache.org/jira/browse/HBASE-4107 I was able to get the log file out of hdfs. Is there a location I can put it back in to have it picked up? Dave On Fri, Jul 15, 2011 at 12:23 PM, Andy Sautins <[EMAIL PROTECTED]>wrote: > > I don't have the log still. Not sure what I was thinking deleting it. I > was a little too aggressive wanting to get my fsck back to having 0 corrupt > blocks. > > What you say is interesting. It's more than possible that I'm > misunderstanding what is going on. > > What we saw with the log file is that we could cat it, but couldn't copy > the file ( would complain about a bad checksum ). I know that's not hard > data, but going by that what you say about applying the log up until the > last sync makes would make sense. What might have thrown me is after a > re-start the logs ( including the corrupt log ) were still in the .logs > folder. We did a full shutdown/restart and the following stacktrace was in > the master logs. After this stacktrace hbase continued to startup, however > the logs ( all logs up until the corrupt log ) for the region with the > corrupt log file were left in the .logs directory. When we removed the > corrupt log file and re-started again all the existing logs were removed > after successful restart as I would expect. > > So is it more likely that the error on shutdown is reasonable and that > the log cleanup just didn't happen on startup? I suppose it makes sense not > to remove them if there is an error, but it did throw me that the corrupt > file as well as previous files were still in the .logs directory. > > 2011-07-14 18:07:45,954 ERROR > org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting hdfs:// > hdnn.dfs.returnpath.net:8020/user/hbase/.logs/hd31.dfs.returnpath.net,60020,1309294522164 > org.apache.hadoop.fs.ChecksumException: Checksum error: > /blk_-8148723766791273697:of:/user/hbase/.logs/hd31.dfs.returnpath.net > ,60020,1309294522164/hd31.dfs.returnpath.net%3A60020.1310675410770 at > 57790464 > at > org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241) > at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176) > at > org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) > at > org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1249) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1899) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1951) > at java.io.DataInputStream.read(DataInputStream.java:132) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) > at > org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1945) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1845) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:198) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:172) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.parseHLog(HLogSplitter.java:429) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:262) > at > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:188) > at > org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:197) |