Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Namenode failures


Copy link to this message
-
Re: Namenode failures
On Sun, Feb 17, 2013 at 5:08 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hi Robert,
>
> Are you by any chance adding files carrying unusual encoding?
I don't believe so.  The only files I push to HDFS are SequenceFiles (with
protobuf objects in them) and HBase's regions, which again is just protobuf
objects.  I don't use any special encodings in the protobufs.
> If its
> possible, can we be sent a bundle of the corrupted log set (all of the
> dfs.name.dir contents) to inspect what seems to be causing the
> corruption?
>

I can give the logs, dfs data dir(s), and 2nn dirs.

https://www.dropbox.com/s/heijq65pmb3esvd/hdfs-bug.tar.gz
> The only identified (but rarely occurring) bug around this part in
> 1.0.4 would be https://issues.apache.org/jira/browse/HDFS-4423. The
> other major corruption bug I know of is already fixed in your version,
> being https://issues.apache.org/jira/browse/HDFS-3652 specifically.
>
> We've not had this report from other users so having a reproduced file
> set (data not required) would be most helpful. If you have logs
> leading to the shutdown and crash as well, that'd be good to have too.
>
> P.s. How exactly are you shutting down the NN each time? A kill -9 or
> a regular SIGTERM shutdown?
>

I shut down the NN with 'bin/stop-dfs.sh'.
>  On Mon, Feb 18, 2013 at 4:31 AM, Robert Dyer <[EMAIL PROTECTED]> wrote:
> > On Sun, Feb 17, 2013 at 4:41 PM, Mohammad Tariq <[EMAIL PROTECTED]>
> wrote:
> >>
> >> You can make use of offine image viewer to diagnose
> >> the fsimage file.
> >
> >
> > Is this not included in the 1.0.x branch?  All of the documentation I
> find
> > for it says to run 'bin/hdfs oev' but I do not have a 'bin/hdfs'.
> >
> >>
> >> Warm Regards,
> >> Tariq
> >> https://mtariq.jux.com/
> >> cloudfront.blogspot.com
> >>
> >>
> >> On Mon, Feb 18, 2013 at 3:31 AM, Robert Dyer <[EMAIL PROTECTED]> wrote:
> >>>
> >>> It just happened again.  This was after a fresh format of HDFS/HBase
> and
> >>> I am attempting to re-import the (backed up) data.
> >>>
> >>>   http://pastebin.com/3fsWCNQY
> >>>
> >>> So now if I restart the namenode, I will lose data from the past 3
> hours.
> >>>
> >>> What is causing this?  How can I avoid it in the future?  Is there an
> >>> easy way to monitor (other than a script grep'ing the logs) the
> checkpoints
> >>> to see when this happens?
> >>>
> >>>
> >>> On Sat, Feb 16, 2013 at 2:39 PM, Robert Dyer <[EMAIL PROTECTED]>
> wrote:
> >>>>
> >>>> Forgot to mention: Hadoop 1.0.4
> >>>>
> >>>>
> >>>> On Sat, Feb 16, 2013 at 2:38 PM, Robert Dyer <[EMAIL PROTECTED]>
> wrote:
> >>>>>
> >>>>> I am at a bit of wits end here.  Every single time I restart the
> >>>>> namenode, I get this crash:
> >>>>>
> >>>>> 2013-02-16 14:32:42,616 INFO
> >>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size
> 168058
> >>>>> loaded in 0 seconds.
> >>>>> 2013-02-16 14:32:42,618 ERROR
> >>>>> org.apache.hadoop.hdfs.server.namenode.NameNode:
> >>>>> java.lang.NullPointerException
> >>>>>     at
> >>>>>
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1099)
> >>>>>     at
> >>>>>
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1111)
> >>>>>     at
> >>>>>
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1014)
> >>>>>     at
> >>>>>
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:208)
> >>>>>     at
> >>>>>
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:631)
> >>>>>     at
> >>>>>
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1021)
> >>>>>     at
> >>>>>
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:839)
> >>>>>     at
> >>>>>
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
> >>>>>     at
> >>>>>
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
> >>>>>     at
> >>>>>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
Robert Dyer
[EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB