-RE: NameNode failure and recovery!
Vijay Thakorlal 2013-04-03, 14:56
The SNN does not act as a backup / standby NameNode in the event of failure.
The sole purpose of the Secondary NameNode (or as it’s otherwise / more correctly known as the Checkpoint Node) is to perform checkpointing of the current state of HDFS:
The SNN retrieves the fsimage and edits files from the NN
The NN rolls the edits file
The SNN Loads the fsimage into memory
Then the SNN replays the edits log file to merge the two
Then the SNN transfers the merged checkpoint back to the NN
The NN uses the checkpoint as the new fsimage file
It’s true that technically you could use the fsimage from the SNN if completely lost the NN – and yes as you said you would “lose” any changes to HDFS that occurred between the NN dieing and the last time the checkpoint occurred. But as mentioned the SNN is not a backup for the NN.
From: Rahul Bhattacharjee [mailto:[EMAIL PROTECTED]]
Sent: 03 April 2013 15:40
To: [EMAIL PROTECTED]
Subject: NameNode failure and recovery!
I was reading about Hadoop and got to know that there are two ways to protect against the name node failures.
1) To write to a nfs mount along with the usual local disk.
2) Use secondary name node. In case of failure of NN , the SNN can take in charge.
My questions :-
1) SNN is always lagging , so when SNN becomes primary in event of a NN failure , then the edits which have not been merged into the image file would be lost , so the system of SNN would not be consistent with the NN before its failure.
2) Also I have read that other purpose of SNN is to periodically merge the edit logs with the image file. In case a setup goes with option #1 (writing to NFS, no SNN) , then who does this merging.