Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Split brain - is it possible in hadoop?


Copy link to this message
-
Re: Split brain - is it possible in hadoop?
Michael Segel 2012-06-19, 12:47
In your example, you only have one active Name Node. So how would you encounter a 'split brain' scenario?
Maybe it would be better if you defined what you mean by a split brain?

-Mike

On Jun 18, 2012, at 8:30 PM, hdev ml wrote:

> All hadoop contributors/experts,
>
> I am trying to simulate split brain in our installation. There are a few
> things we want to know
>
> 1. Does data corruption happen?
> 2. If Yes in #1, how to recover from it.
> 3. What are the corrective steps to take in this situation e.g. killing one
> namenode etc
>
> So to simulate this I took following steps.
>
> 1. We already have a healthy test cluster, consisting of 4 machines. One
> machine runs namenode and a datanode, other machine runs secondarynamenode
> and a datanode, 3rd runs jobtracker and a datanode, and 4th one just a
> datanode.
> 2. Copied the hadoop installation folder to a new location in the datanode.
> 3. Kept all configurations same in hdfs-site and core-site xmls, except
> renamed the fs.default.name to a different URI
> 4. The namenode directory - dfs.name.dir was pointing to the same shared
> NFS mounted directory to which the main namenode points to.
>
> I started this standby namenode using following command
> bin/hadoop-daemon.sh --config conf --hosts slaves start namenode
>
> It errored out saying that "the directory is already locked", which is an
> expected behaviour. The directory has been locked by the original namenode.
>
> So I changed the dfs.name.dir to some other folder, and issued the same
> command. It fails with message - "namenode has not been formatted", which
> is also expected.
>
> This makes me think - does splitbrain situation really occur in hadoop?
>
> My understanding is that split brain happens because of timeouts on the
> main namenode. The way it happens is, when the timeout occurs, the HA
> implementation - Be it Linux HA, Veritas etc., thinks that the main
> namenode has died and tries to start the standby namenode. The standby
> namenode starts up and then main namenode comes back from the timeout phase
> and starts functioning as if nothing happened, giving rise to 2 namenodes
> in the cluster - Split Brain.
>
> Considering the error messages and the above understanding, I cannot point
> 2 different namenodes to same directory, because the main namenode isn't
> responding but has locked the directory.
>
> So can I safely conclude that split brain does not occur in hadoop?
>
> Or am I missing any other situation where split brain happens and the
> namenode directory is not locked, thus allowing the standby namenode also
> to start up?
>
> Has anybody encountered this?
>
> Any help is really appreciated.
>
> Harshad