Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> what will happen if a backup name node folder becomes unaccessible?


Copy link to this message
-
Re: what will happen if a backup name node folder becomes unaccessible?
On Fri, Aug 27, 2010 at 8:30 PM, jiang licht <[EMAIL PROTECTED]> wrote:
> The same behavior is seen in CDH3 hadoop-0.20.2+228 if a mounted nfs folder for dfs.name.dir is not available when a name node starts...
>
> Michael
>
> --- On Fri, 8/27/10, Edward Capriolo <[EMAIL PROTECTED]> wrote:
>
> From: Edward Capriolo <[EMAIL PROTECTED]>
> Subject: Re: what will happen if a backup name node folder becomes unaccessible?
> To: [EMAIL PROTECTED]
> Date: Friday, August 27, 2010, 6:57 PM
>
> On Tue, Aug 24, 2010 at 7:59 PM, Sudhir Vallamkondu
> <[EMAIL PROTECTED]> wrote:
>> The cloudera distribution seems to be working fine when a dfs.name.dir
>> directory is inaccessible in midst of namenode running.
>>
>> See below
>>
>> hadoop@training-vm:~$ hadoop version
>> Hadoop 0.20.1+152
>> Subversion  -r c15291d10caa19c2355f437936c7678d537adf94
>> Compiled by root on Mon Nov  2 05:15:37 UTC 2009
>>
>> hadoop@training-vm:~$ jps
>> 8923 Jps
>> 8548 JobTracker
>> 8467 SecondaryNameNode
>> 8250 NameNode
>> 8357 DataNode
>> 8642 TaskTracker
>>
>> hadoop@training-vm:~$ /usr/lib/hadoop/bin/stop-all.sh
>> stopping jobtracker
>> localhost: stopping tasktracker
>> stopping namenode
>> localhost: stopping datanode
>> localhost: stopping secondarynamenode
>>
>> hadoop@training-vm:~$ mkdir edit_log_dir1
>>
>> hadoop@training-vm:~$ mkdir edit_log_dir2
>>
>> hadoop@training-vm:~$ ls
>> edit_log_dir1  edit_log_dir2
>>
>> hadoop@training-vm:~$ ls -ltr /var/lib/hadoop-0.20/cache/hadoop/dfs/name
>> total 8
>> drwxr-xr-x 2 hadoop hadoop 4096 2009-10-15 16:17 image
>> drwxr-xr-x 2 hadoop hadoop 4096 2010-08-24 15:56 current
>>
>> hadoop@training-vm:~$ cp -r /var/lib/hadoop-0.20/cache/hadoop/dfs/name
>> edit_log_dir1
>>
>> hadoop@training-vm:~$ cp -r /var/lib/hadoop-0.20/cache/hadoop/dfs/name
>> edit_log_dir2
>>
>> ------ hdfs-site.xml added new dirs
>>
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <configuration>
>>  <property>
>>    <name>dfs.replication</name>
>>    <value>1</value>
>>  </property>
>>  <property>
>>     <name>dfs.permissions</name>
>>     <value>false</value>
>>  </property>
>>  <property>
>>     <!-- specify this so that running 'hadoop namenode -format' formats the
>> right dir -->
>>     <name>dfs.name.dir</name>
>> <value>/var/lib/hadoop-0.20/cache/hadoop/dfs/name,/home/hadoop/edit_log_dir1
>> ,/home/hadoop/edit_log_dir2</value>
>>  </property>
>>   <property>
>>     <name>fs.checkpoint.period</name>
>>     <value>600</value>
>>  </property>
>>  <property>
>>    <name>dfs.namenode.plugins</name>
>>    <value>org.apache.hadoop.thriftfs.NamenodePlugin</value>
>>  </property>
>>  <property>
>>    <name>dfs.datanode.plugins</name>
>>    <value>org.apache.hadoop.thriftfs.DatanodePlugin</value>
>>  </property>
>>  <property>
>>    <name>dfs.thrift.address</name>
>>    <value>0.0.0.0:9090</value>
>>  </property>
>> </configuration>
>>
>> ---- start all daemons
>>
>> hadoop@training-vm:~$ /usr/lib/hadoop/bin/start-all.sh
>> starting namenode, logging to
>> /usr/lib/hadoop/bin/../logs/hadoop-hadoop-namenode-training-vm.out
>> localhost: starting datanode, logging to
>> /usr/lib/hadoop/bin/../logs/hadoop-hadoop-datanode-training-vm.out
>> localhost: starting secondarynamenode, logging to
>> /usr/lib/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-training-vm.out
>> starting jobtracker, logging to
>> /usr/lib/hadoop/bin/../logs/hadoop-hadoop-jobtracker-training-vm.out
>> localhost: starting tasktracker, logging to
>> /usr/lib/hadoop/bin/../logs/hadoop-hadoop-tasktracker-training-vm.out
>>
>>
>> -------- namenode log confirms all dirs taken
>>
>> 2010-08-24 16:20:48,718 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>> /************************************************************
>> STARTUP_MSG: Starting NameNode
>> STARTUP_MSG:   host = training-vm/127.0.0.1
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 0.20.1+152
>> STARTUP_MSG:   build =  -r c15291d10caa19c2355f437936c7678d537adf94;
NFS is not exactly equal to a local file system. For example you can
soft or hard mount an NFS file system, and your system will react
differently if the NFS mount vanishes. On some operating systems a
hard mount will cause an un-interpretable wait. soft mounts, which I
believe is the linux default, react differently when the NFS server
vanish and that could explain the errors you are getting.

If you take the approach that NFS works "exactly" a local file system
you often will be disappointed.