Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Hadoop Failover and Recovery

Copy link to this message
Re: Hadoop Failover and Recovery
On 8/28/09 8:58 PM, "sagar_shukla" <[EMAIL PROTECTED]> wrote:
>      What are the failover and recovery mechanisms available for Hadoop ? I
> searched over the internet but could not find any good documentation for
> different scenarios like datanode going down or namenode going down.

In most cases, the documentation for "fixing" Hadoop is:

A) fix hardware
B) clean out tmp files, etc
C) restart processes for that node

Name node is a bit of a special case. I'm amused that
http://wiki.apache.org/hadoop/NameNodeFailover is empty. :)

For name node, you have some preventative things to do first:

A) have matching hardware available
B) make sure you have fsimage and edits file writing or at least available
to that machine via NFS, SMB, whatever it takes

On failure, use that backup image to bring the name node backup on your
spare box.

Note that the NN isn't HA.  I suspect something like SunCluster or VCS could
be used here to make it less susceptible to issues, but I don't know if
anyone has tried it.