|
|
-
Reason of Formatting Namenode
Adarsh Sharma 2011-03-10, 04:27
Dear all,
I have configured several times a Hadoop Cluster of 2,3,5,8 nodes but one doubt in my mind always occur. Why it is necessary to format Hadoop Namenode by *bin/hadoop -namenode format *command. What is the reason and logic behind this.
Please justify if someone knows. Thanks & best Regards,
Adarsh Sharma
-
Re: Reason of Formatting Namenode
Harsh J 2011-03-10, 05:01
Formatting the NameNode initializes the FSNameSystem in the dfs.name.dir directories, to prepare for use.
The format command typically writes a VERSION file that specifies what the NamespaceID for this FS instance is, what was its ctime, and what is the version (of the file's layout) in use.
This is helpful in making every NameNode instance unique, among other things. DataNode blocks carry the namespace-id information that lets them relate blocks to a NameNode (and thereby validate, etc.).
-- Harsh J www.harshj.com
-
Re: Reason of Formatting Namenode
Adarsh Sharma 2011-03-10, 05:48
Thanks Harsh, i.e why if we again format namenode after loading some data INCOMATIBLE NAMESPACE ID's error occurs. Best Regards,
Adarsh Sharma Harsh J wrote: > Formatting the NameNode initializes the FSNameSystem in the > dfs.name.dir directories, to prepare for use. > > The format command typically writes a VERSION file that specifies what > the NamespaceID for this FS instance is, what was its ctime, and what > is the version (of the file's layout) in use. > > This is helpful in making every NameNode instance unique, among other > things. DataNode blocks carry the namespace-id information that lets > them relate blocks to a NameNode (and thereby validate, etc.). > >
-
Re: Reason of Formatting Namenode
Edward Capriolo 2011-03-10, 22:48
On Thu, Mar 10, 2011 at 12:48 AM, Adarsh Sharma <[EMAIL PROTECTED]> wrote: > Thanks Harsh, i.e why if we again format namenode after loading some data > INCOMATIBLE NAMESPACE ID's error occurs. > > > Best Regards, > > Adarsh Sharma > > > > > Harsh J wrote: >> >> Formatting the NameNode initializes the FSNameSystem in the >> dfs.name.dir directories, to prepare for use. >> >> The format command typically writes a VERSION file that specifies what >> the NamespaceID for this FS instance is, what was its ctime, and what >> is the version (of the file's layout) in use. >> >> This is helpful in making every NameNode instance unique, among other >> things. DataNode blocks carry the namespace-id information that lets >> them relate blocks to a NameNode (and thereby validate, etc.). >> >> > >
If you do not tell where you NN to store data it stores it to /tmp. And your operating system cleans up temp.
The reason for the error you see is datanodes don't like to suddenly connect to new namenodes. So as a safety they do not start up until they are cleared.
-
Re: Reason of Formatting Namenode
Boris Shkolnik 2011-03-10, 23:23
On the first run you want namenode to initialize its directories (where it store VERSION file, fsimage and edits). On the subsequent formats - you are making sure you have a new EMPTY file system. If you don't do format NameNode will load up fsimage and edits. There is also matter of generating new space id, which is matched against Datanode's ones. So if you format Namenode you need to cleanup data from Datanodes.
On the other hand, if you just add Datanodes to a running cluster - you don't have to format NN.
Boris. On 3/9/11 8:27 PM, "Adarsh Sharma" <[EMAIL PROTECTED]> wrote:
> Dear all, > > I have configured several times a Hadoop Cluster of 2,3,5,8 nodes but > one doubt in my mind always occur. > Why it is necessary to format Hadoop Namenode by *bin/hadoop -namenode > format *command. > What is the reason and logic behind this. > > Please justify if someone knows. > > > Thanks & best Regards, > > Adarsh Sharma
|
|