|
|
-
HDFS reports corrupted blocks after HBase reinstall
Jonathan Bender 2011-04-26, 23:53
Hi all, I'm having a strange error which I can't exactly figure out.
After wiping my /hbase HDFS directory to do a fresh install, I am getting "MISSING BLOCKS" in this /hbase directory, which cause HDFS to start up in safe mode. This doesn't happen until I start my region servers, so I have a feeling there is some kind of corrupted metadata that is being loaded from these region servers.
Is there a graceful way to wipe the HBase directory clean? Any local directories on the region servers /master / ZK server that I should be wiping as well?
Cheers, Jon
+
Jonathan Bender 2011-04-26, 23:53
-
Re: HDFS reports corrupted blocks after HBase reinstall
Jean-Daniel Cryans 2011-04-26, 23:56
Unless HBase was running when you wiped that out (and even then), I don't see how this could happen. Could you match those blocks to the files using fsck and figure when the files were created and if they were part of the old install?
Thx,
J-D
On Tue, Apr 26, 2011 at 4:53 PM, Jonathan Bender <[EMAIL PROTECTED]> wrote: > Hi all, I'm having a strange error which I can't exactly figure out. > > After wiping my /hbase HDFS directory to do a fresh install, I am getting > "MISSING BLOCKS" in this /hbase directory, which cause HDFS to start up in > safe mode. This doesn't happen until I start my region servers, so I have a > feeling there is some kind of corrupted metadata that is being loaded from > these region servers. > > Is there a graceful way to wipe the HBase directory clean? Any local > directories on the region servers /master / ZK server that I should be > wiping as well? > > Cheers, > Jon >
+
Jean-Daniel Cryans 2011-04-26, 23:56
-
Re: HDFS reports corrupted blocks after HBase reinstall
Himanshu Vashishtha 2011-04-27, 00:07
Could it be the /tmp/hbase-<userID> directory that is playing the culprit. just a wild guess though.
On Tue, Apr 26, 2011 at 5:56 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
> Unless HBase was running when you wiped that out (and even then), I > don't see how this could happen. Could you match those blocks to the > files using fsck and figure when the files were created and if they > were part of the old install? > > Thx, > > J-D > > On Tue, Apr 26, 2011 at 4:53 PM, Jonathan Bender > <[EMAIL PROTECTED]> wrote: > > Hi all, I'm having a strange error which I can't exactly figure out. > > > > After wiping my /hbase HDFS directory to do a fresh install, I am getting > > "MISSING BLOCKS" in this /hbase directory, which cause HDFS to start up > in > > safe mode. This doesn't happen until I start my region servers, so I > have a > > feeling there is some kind of corrupted metadata that is being loaded > from > > these region servers. > > > > Is there a graceful way to wipe the HBase directory clean? Any local > > directories on the region servers /master / ZK server that I should be > > wiping as well? > > > > Cheers, > > Jon > > >
+
Himanshu Vashishtha 2011-04-27, 00:07
-
Re: HDFS reports corrupted blocks after HBase reinstall
Jonathan Bender 2011-04-27, 00:19
Wow, this is more intense than I thought...as soon as I load HBase again, my HDFS filesystem reverts back to an older snapshot essentially. As in, I don't see any of the changes I had made since that time, in the hbase table or otherwise.
I'm using CDH3 beta 4, which I believe stores its local hbase data in a different directory--not entirely sure where though.
I'm not entirely sure what happened to mess this up, but it seems pretty serious.
On Tue, Apr 26, 2011 at 5:07 PM, Himanshu Vashishtha < [EMAIL PROTECTED]> wrote:
> Could it be the /tmp/hbase-<userID> directory that is playing the culprit. > just a wild guess though. > > > On Tue, Apr 26, 2011 at 5:56 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > >> Unless HBase was running when you wiped that out (and even then), I >> don't see how this could happen. Could you match those blocks to the >> files using fsck and figure when the files were created and if they >> were part of the old install? >> >> Thx, >> >> J-D >> >> On Tue, Apr 26, 2011 at 4:53 PM, Jonathan Bender >> <[EMAIL PROTECTED]> wrote: >> > Hi all, I'm having a strange error which I can't exactly figure out. >> > >> > After wiping my /hbase HDFS directory to do a fresh install, I am >> getting >> > "MISSING BLOCKS" in this /hbase directory, which cause HDFS to start up >> in >> > safe mode. This doesn't happen until I start my region servers, so I >> have a >> > feeling there is some kind of corrupted metadata that is being loaded >> from >> > these region servers. >> > >> > Is there a graceful way to wipe the HBase directory clean? Any local >> > directories on the region servers /master / ZK server that I should be >> > wiping as well? >> > >> > Cheers, >> > Jon >> > >> > >
+
Jonathan Bender 2011-04-27, 00:19
-
Re: HDFS reports corrupted blocks after HBase reinstall
Jonathan Bender 2011-04-27, 15:28
So it's definitely a case of HDFS not being able to recover the image. Maybe this is better directed toward another list, but has anyone had issues with this, or any suggestions for trying to eradicate this? 2011-04-26 17:15:56,898 INFO org.apache.hadoop.hdfs.server.common.Storage: Recovering storage directory /var/lib/hadoop-0.20/cache/hadoop/dfs/name from failed checkpoint. 2011-04-26 17:15:56,905 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 204 2011-04-26 17:15:57,020 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0 2011-04-26 17:15:57,021 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 26833 loaded in 0 seconds. 2011-04-26 17:15:57,257 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached end of edit log Number of transactions found 528 2011-04-26 17:15:57,258 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits of size 1049092 edits # 528 loaded in 0 seconds. 2011-04-26 17:15:57,265 ERROR org.apache.hadoop.hdfs.server.common.Storage: Unable to save image for /var/lib/hadoop-0.20/cache/hadoop/dfs/name java.io.IOException: saveLeases found path /hbase/base_tmp/.logs/ sv004.my.domain.com,60020,1302882411768/sv004.my.domain.com%3A60020.1302882412951 but no matching entry in namespace. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:5153) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1071) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1170) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:1118) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:347) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:321) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:267) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:461) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1202) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1211) 2011-04-26 17:15:57,273 WARN org.apache.hadoop.hdfs.server.common.Storage: FSImage:processIOError: removing storage: /var/lib/hadoop-0.20/cache/hadoop/dfs/name 2011-04-26 17:15:57,274 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 1553 msecs On Tue, Apr 26, 2011 at 5:19 PM, Jonathan Bender <[EMAIL PROTECTED]>wrote:
> Wow, this is more intense than I thought...as soon as I load HBase again, > my HDFS filesystem reverts back to an older snapshot essentially. As in, I > don't see any of the changes I had made since that time, in the hbase table > or otherwise. > > I'm using CDH3 beta 4, which I believe stores its local hbase data in a > different directory--not entirely sure where though. > > I'm not entirely sure what happened to mess this up, but it seems pretty > serious. > > On Tue, Apr 26, 2011 at 5:07 PM, Himanshu Vashishtha < > [EMAIL PROTECTED]> wrote: > >> Could it be the /tmp/hbase-<userID> directory that is playing the culprit. >> just a wild guess though. >> >> >> On Tue, Apr 26, 2011 at 5:56 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: >> >>> Unless HBase was running when you wiped that out (and even then), I >>> don't see how this could happen. Could you match those blocks to the >>> files using fsck and figure when the files were created and if they >>> were part of the old install? >>> >>> Thx, >>> >>> J-D >>> >>> On Tue, Apr 26, 2011 at 4:53 PM, Jonathan Bender >>> <[EMAIL PROTECTED]> wrote: >>> > Hi all, I'm having a strange error which I can't exactly figure out. >>> > >>> > After wiping my /hbase HDFS directory to do a fresh install, I am
+
Jonathan Bender 2011-04-27, 15:28
-
Re: HDFS reports corrupted blocks after HBase reinstall
Jean-Daniel Cryans 2011-04-27, 16:43
I don't remember ever seeing this :|
Was your secondary namenode running on a different host or storing its data in a different folder? Was that wiped out too?
J-D
On Wed, Apr 27, 2011 at 8:28 AM, Jonathan Bender <[EMAIL PROTECTED]> wrote: > So it's definitely a case of HDFS not being able to recover the image. > Maybe this is better directed toward another list, but has anyone had > issues with this, or any suggestions for trying to eradicate this? > > > > > 2011-04-26 17:15:56,898 INFO org.apache.hadoop.hdfs.server.common.Storage: > Recovering storage directory /var/lib/hadoop-0.20/cache/hadoop/dfs/name from > failed checkpoint. > 2011-04-26 17:15:56,905 INFO org.apache.hadoop.hdfs.server.common.Storage: > Number of files = 204 > 2011-04-26 17:15:57,020 INFO org.apache.hadoop.hdfs.server.common.Storage: > Number of files under construction = 0 > 2011-04-26 17:15:57,021 INFO org.apache.hadoop.hdfs.server.common.Storage: > Image file of size 26833 loaded in 0 seconds. > 2011-04-26 17:15:57,257 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached > end of edit log Number of transactions found 528 > 2011-04-26 17:15:57,258 INFO org.apache.hadoop.hdfs.server.common.Storage: > Edits file /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits of size > 1049092 edits # 528 loaded in 0 seconds. > 2011-04-26 17:15:57,265 ERROR org.apache.hadoop.hdfs.server.common.Storage: > Unable to save image for /var/lib/hadoop-0.20/cache/hadoop/dfs/name > java.io.IOException: saveLeases found path /hbase/base_tmp/.logs/ > sv004.my.domain.com,60020,1302882411768/sv004.my.domain.com%3A60020.1302882412951 > but no matching entry in namespace. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:5153) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1071) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1170) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:1118) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:347) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:321) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:267) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:461) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1202) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1211) > 2011-04-26 17:15:57,273 WARN org.apache.hadoop.hdfs.server.common.Storage: > FSImage:processIOError: removing storage: > /var/lib/hadoop-0.20/cache/hadoop/dfs/name > 2011-04-26 17:15:57,274 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading > FSImage in 1553 msecs > > > > > On Tue, Apr 26, 2011 at 5:19 PM, Jonathan Bender > <[EMAIL PROTECTED]>wrote: > >> Wow, this is more intense than I thought...as soon as I load HBase again, >> my HDFS filesystem reverts back to an older snapshot essentially. As in, I >> don't see any of the changes I had made since that time, in the hbase table >> or otherwise. >> >> I'm using CDH3 beta 4, which I believe stores its local hbase data in a >> different directory--not entirely sure where though. >> >> I'm not entirely sure what happened to mess this up, but it seems pretty >> serious. >> >> On Tue, Apr 26, 2011 at 5:07 PM, Himanshu Vashishtha < >> [EMAIL PROTECTED]> wrote: >> >>> Could it be the /tmp/hbase-<userID> directory that is playing the culprit. >>> just a wild guess though. >>> >>> >>> On Tue, Apr 26, 2011 at 5:56 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: >>> >>>> Unless HBase was running when you wiped that out (and even then), I >>>> don't see how this could happen. Could you match those blocks to the
+
Jean-Daniel Cryans 2011-04-27, 16:43
|
|