Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> desperate question about NameNode startup sequence

Copy link to this message
Re: desperate question about NameNode startup sequence
The problem with checkpoint /2nn is that it happily "runs" and has no
outward indication that it is unable to connect.

Because you have a large edits file you startup will complete, however with
that size it could take hours. It logs nothing while this is going on but
as long as the CPU is working that means it is progressing.

We have a nagios check on the size of this directory so if the edit rolling
stops we know about it.

On Saturday, December 17, 2011, Brock Noland <[EMAIL PROTECTED]> wrote:
> Hi,
> Since your using CDH2, I am moving this to CDH-USER. You can subscribe
> http://groups.google.com/a/cloudera.org/group/cdh-user
> BCC'd common-user
> On Sat, Dec 17, 2011 at 2:01 AM, Meng Mao <[EMAIL PROTECTED]> wrote:
>> Maybe this is a bad sign -- the edits.new was created before the master
>> node crashed, and is huge:
>> -bash-3.2$ ls -lh /hadoop/hadoop-metadata/cache/dfs/name/current
>> total 41G
>> -rw-r--r-- 1 hadoop hadoop 3.8K Jan 27  2011 edits
>> -rw-r--r-- 1 hadoop hadoop  39G Dec 17 00:44 edits.new
>> -rw-r--r-- 1 hadoop hadoop 2.5G Jan 27  2011 fsimage
>> -rw-r--r-- 1 hadoop hadoop    8 Jan 27  2011 fstime
>> -rw-r--r-- 1 hadoop hadoop  101 Jan 27  2011 VERSION
>> could this mean something was up with our SecondaryNameNode and rolling
>> edits file?
> Yes it looks like a checkpoint never completed. It's a good idea to
> monitor the mtime on fsimage to ensure it never gets too old.
> Has a checkpoint completed since you restarted?
> Brock