Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Unable to start NN after rack assignment attempt


Copy link to this message
-
Re: Unable to start NN after rack assignment attempt
Todd-

Thanks for your reply. I went out on a limb and started digging in the
source code and figures it was FSImage. So I saved it, and copied over
the copy from my checkpoint directory and got running again.

I ran a few jobs to test and returned to getting a problem new node
running. Once again it looks like I will have to manually force an exit
from safe mode to run fsck -move

I sent mail to Harsh earlier - I think I must migrate to CDH as I fear
my manual hacking with configs and such has caused the fragile state
that the cluster is in now.

Thanks,

Terry

On 05/18/2012 12:34 PM, Todd Lipcon wrote:
> Hi Terry,
>
> It seems like something got truncated in your FSImage... though it's
> unclear how that might have happened.
>
> If you're able to share your logs and your dfs.name.dir contents, feel
> free to contact me off-list and I can try to take a look to diagnose
> the issue and try to recover the system. Of course whenever any
> corruption issue occurs we take it seriously and want to get at a root
> cause to prevent future occurrences!
>
> Thanks
> -Todd
>
> On Fri, May 18, 2012 at 6:57 AM, Terry Healy <[EMAIL PROTECTED]> wrote:
>> Sorry, forgot to attach the trace:
>> <code>
>> 2012-05-18 09:54:45,355 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 128
>> 2012-05-18 09:54:45,379 ERROR
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
>> initialization failed.
>> java.io.EOFException
>>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>        at org.apache.hadoop.io.UTF8.readFields(UTF8.java:112)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1808)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:901)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:824)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
>> 2012-05-18 09:54:45,380 ERROR
>> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
>>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>        at org.apache.hadoop.io.UTF8.readFields(UTF8.java:112)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1808)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:901)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:824)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
>>
>> 2012-05-18 09:54:45,380 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

Terry Healy / [EMAIL PROTECTED]
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973