Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - How to fix a corrupted disk?


Copy link to this message
-
Re: How to fix a corrupted disk?
Allen Wittenauer 2010-06-10, 14:04

On Jun 9, 2010, at 10:13 PM, Sean Bigdatafun wrote:

> I have two questions here about a HDFS cell. Suppose the file that I am interested is stored on 3 datanodes A, B, C. And A suddenly crashed, I understand I can still read my file because I have two copies available at this moment. But my question is which software module is responsible to bring A back to running? (is there a watchdog server?)
>  

No, there is not a watchdog.  Each installation is slightly different and (almost) every OS provides facilities to guarantee a daemon is continually running.  [SMF, launchd, daemontools, etc.].   In most installations, I suspect wetware is used to bring back dead datanode processes so that the reason of the crash can be investigated.

> Furthermore, if the disk on server A is totally corrupted (disk failure), what should I do to bring my file back to 3 replication mode?

Fix the disk on A and restart the datanode process.

When you have more than 3 datanodes, the namenode will automatically replicate any under-replicated blocks if there is a node that is qualified to do so.  [In other words, if you have a grid large enough to support topology, the namenode will not violate topology just to replicate a block.  It is expected that there are enough nodes in enough racks to not cause policy violations.]