-Re: why not hadoop backup name node data to local disk daily or hourly?
I actually have this exact same error. After running my namenode for
awhile (with a snn), it gets to a point where the snn starts crashing and
if I try to restart the NN I will get this problem. I typically wind up
having to go with a much older copy of the image and edits files in order
to get it up and running and naturally that means data loss.
On Mon, Dec 24, 2012 at 8:22 PM, 周梦想 <[EMAIL PROTECTED]> wrote:
> thanks Tariq,
> Now we are trying to recover data，but some data has lost forever.
> the logs just reported NULL Point Exception:
> 2012-12-17 17:09:05,646 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1094)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1106)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1009)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:208)
> at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:626)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1015)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:833)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:362)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:496)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
> We changed the source of hadoop to try catch this exception and rebuild
> it, then we can start hadoop NN, but the problem of HBase remained.
> so we have to upgrade the version of HBase and try to repair HBase Meta
> data from Regins data.
> Now we are planning to upgrade to stable version of hadoop 1.0.4 and HBase
> Best regards,
> 2012/12/24 Mohammad Tariq <[EMAIL PROTECTED]>
>> Hello Andy,
>> I hope you are stable now :)
>> Just a quick question. Did you find anything interesting in the NN, SNN,
>> DN logs?
>> And my grandma says, I look like Abhishek Bachchcan<http://en.wikipedia.org/wiki/Abhishek_Bacchan>;)
>> Best Regards,
>> On Mon, Dec 24, 2012 at 4:24 PM, 周梦想 <[EMAIL PROTECTED]> wrote:
>>> I stoped the Hadoop, changed every nodes' IP and configured again, and
>>> started Hadoop again. Yes, we did change the IP of NN.
>>> 2012/12/24 Nitin Pawar <[EMAIL PROTECTED]>
>>>> what do you mean by this "We changed all IPs of the Hadoop System"
>>>> You changed the IPs of the nodes in one go? or you retired nodes one by
>>>> one and changed IPs and brought them back in rotation? Also did you change
>>>> IP of your NN as well ?
>>>> On Mon, Dec 24, 2012 at 4:10 PM, 周梦想 <[EMAIL PROTECTED]> wrote:
>>>>> Actually the problem was beggining at SecondNameNode. We changed all
>>>>> IPs of the Hadoop System
>>>> Nitin Pawar