|
|
-
Re: Hadoop Failover and RecoveryAllen Wittenauer 2009-08-31, 17:01
On 8/28/09 8:58 PM, "sagar_shukla" <[EMAIL PROTECTED]> wrote:
> What are the failover and recovery mechanisms available for Hadoop ? I > searched over the internet but could not find any good documentation for > different scenarios like datanode going down or namenode going down. In most cases, the documentation for "fixing" Hadoop is: A) fix hardware B) clean out tmp files, etc C) restart processes for that node Name node is a bit of a special case. I'm amused that http://wiki.apache.org/hadoop/NameNodeFailover is empty. :) For name node, you have some preventative things to do first: A) have matching hardware available B) make sure you have fsimage and edits file writing or at least available to that machine via NFS, SMB, whatever it takes On failure, use that backup image to bring the name node backup on your spare box. Note that the NN isn't HA. I suspect something like SunCluster or VCS could be used here to make it less susceptible to issues, but I don't know if anyone has tried it. |