Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Backup node crashed with NPE and failed to restart


Copy link to this message
-
Re: Backup node crashed with NPE and failed to restart
Harsh J 2012-10-24, 14:58
Hi,

First off, do not use 0.21, it is unsupported/unmaintained. Use 2.x if
you want HA-NN capabilities. See
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html.

Second, BackupNode/CheckpointNode is also unmaintained actively, and
may soon be removed away in favor of HA NameNodes and the (if not HA)
SecondaryNameNode.

Regarding your metadata, if your NN is still up, issue a "dfsadmin
-saveNamespace" to recreate a good copy of image and edits from the
memory. If your NN was taken down and fails to start anymore, try to
restore from an older checkpoint - do you have one?

On Mon, Oct 22, 2012 at 8:25 PM, rongshen.long
<[EMAIL PROTECTED]> wrote:
> hi,
> I tried to run a backup node on hdfs 0.21 , however the daemon crashed with
> NPE (stack trace as below) and
> left an 'edits.new' file in the $dfs.namenode.name.dir/current diretory .
> After that , I failed to restart the namenode and the backup node because of
> the same exception.
> Could anyone give me a help to recovery the cluster?  Although the NN can be
> restarted by creating an empty 'edits' file ,much data would be lost .
>
> 12/10/09 15:32:45 ERROR namenode.Checkpointer: Throwable Exception in
> doCheckpoint:
> java.lang.NullPointerException
>         at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1765)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1753)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:708)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209)
>         at
> org.apache.hadoop.hdfs.server.namenode.BackupStorage.loadCheckpoint(BackupStorage.java:158)
>         at
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:243)
>         at
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:141)
> 12/10/09 15:32:45 WARN namenode.FSNamesystem: ReplicationMonitor thread
> received InterruptedException.java.lang.InterruptedException: sleep
> interrupted
> 12/10/09 15:32:45 WARN namenode.DecommissionManager: Monitor interrupted:
> java.lang.InterruptedException: sleep interrupted
> 12/10/09 15:32:45 INFO namenode.FSNamesystem: Number of transactions: 24
> Total time for transactions(ms): 4Number of transactions batched in Syncs: 0
> Number of syncs: 25 SyncTimes(ms): 239
> 12/10/09 15:32:45 INFO ipc.Server: Stopping server on 50100
>
>
>
> 2012-10-22
> ________________________________
> rongshen.long

--
Harsh J