Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Unable to start NN after rack assignment attempt


Copy link to this message
-
Re: Unable to start NN after rack assignment attempt
Todd-

Thanks for your reply. I went out on a limb and started digging in the
source code and figures it was FSImage. So I saved it, and copied over
the copy from my checkpoint directory and got running again.

I ran a few jobs to test and returned to getting a problem new node
running. Once again it looks like I will have to manually force an exit
from safe mode to run fsck -move

I sent mail to Harsh earlier - I think I must migrate to CDH as I fear
my manual hacking with configs and such has caused the fragile state
that the cluster is in now.

Thanks,

Terry

On 05/18/2012 12:34 PM, Todd Lipcon wrote:
> Hi Terry,
>
> It seems like something got truncated in your FSImage... though it's
> unclear how that might have happened.
>
> If you're able to share your logs and your dfs.name.dir contents, feel
> free to contact me off-list and I can try to take a look to diagnose
> the issue and try to recover the system. Of course whenever any
> corruption issue occurs we take it seriously and want to get at a root
> cause to prevent future occurrences!
>
> Thanks
> -Todd
>
> On Fri, May 18, 2012 at 6:57 AM, Terry Healy <[EMAIL PROTECTED]> wrote:
>> Sorry, forgot to attach the trace:
>> <code>
>> 2012-05-18 09:54:45,355 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 128
>> 2012-05-18 09:54:45,379 ERROR
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
>> initialization failed.
>> java.io.EOFException
>>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>        at org.apache.hadoop.io.UTF8.readFields(UTF8.java:112)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1808)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:901)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:824)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
>> 2012-05-18 09:54:45,380 ERROR
>> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
>>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>        at org.apache.hadoop.io.UTF8.readFields(UTF8.java:112)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.readString(FSImage.java:1808)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:901)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:824)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:372)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
>>        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
>>
>> 2012-05-18 09:54:45,380 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

Terry Healy / [EMAIL PROTECTED]
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB