Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Split brain - is it possible in hadoop?


Copy link to this message
-
Re: Split brain - is it possible in hadoop?
In your example, you only have one active Name Node. So how would you encounter a 'split brain' scenario?
Maybe it would be better if you defined what you mean by a split brain?

-Mike

On Jun 18, 2012, at 8:30 PM, hdev ml wrote:

> All hadoop contributors/experts,
>
> I am trying to simulate split brain in our installation. There are a few
> things we want to know
>
> 1. Does data corruption happen?
> 2. If Yes in #1, how to recover from it.
> 3. What are the corrective steps to take in this situation e.g. killing one
> namenode etc
>
> So to simulate this I took following steps.
>
> 1. We already have a healthy test cluster, consisting of 4 machines. One
> machine runs namenode and a datanode, other machine runs secondarynamenode
> and a datanode, 3rd runs jobtracker and a datanode, and 4th one just a
> datanode.
> 2. Copied the hadoop installation folder to a new location in the datanode.
> 3. Kept all configurations same in hdfs-site and core-site xmls, except
> renamed the fs.default.name to a different URI
> 4. The namenode directory - dfs.name.dir was pointing to the same shared
> NFS mounted directory to which the main namenode points to.
>
> I started this standby namenode using following command
> bin/hadoop-daemon.sh --config conf --hosts slaves start namenode
>
> It errored out saying that "the directory is already locked", which is an
> expected behaviour. The directory has been locked by the original namenode.
>
> So I changed the dfs.name.dir to some other folder, and issued the same
> command. It fails with message - "namenode has not been formatted", which
> is also expected.
>
> This makes me think - does splitbrain situation really occur in hadoop?
>
> My understanding is that split brain happens because of timeouts on the
> main namenode. The way it happens is, when the timeout occurs, the HA
> implementation - Be it Linux HA, Veritas etc., thinks that the main
> namenode has died and tries to start the standby namenode. The standby
> namenode starts up and then main namenode comes back from the timeout phase
> and starts functioning as if nothing happened, giving rise to 2 namenodes
> in the cluster - Split Brain.
>
> Considering the error messages and the above understanding, I cannot point
> 2 different namenodes to same directory, because the main namenode isn't
> responding but has locked the directory.
>
> So can I safely conclude that split brain does not occur in hadoop?
>
> Or am I missing any other situation where split brain happens and the
> namenode directory is not locked, thus allowing the standby namenode also
> to start up?
>
> Has anybody encountered this?
>
> Any help is really appreciated.
>
> Harshad
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB