Can it notice the node is down sooner? If that node is serving an active
region (or if it's a datanode for an active region), that would be a
potentially large amount of downtime. With comodity hardware, and a large
enough cluster, there will always be a machine or two being rebuilt...
On Thursday, June 21, 2012, Michael Segel wrote:
> Assuming that you have an Apache release (Apache, HW, Cloudera) ...
> (If MapR, replace the drive and you should be able to repair the cluster
> from the console. Node doesn't go down. )
> Node goes down.
> 10 min later, cluster sees node down. Should then be able to replicate the
> missing blocks.
> Replace disk w new disk and rebuild file system.
> Bring node up.
> Rebalance cluster.
> That should be pretty much it.
> On Jun 21, 2012, at 10:17 PM, David Charle wrote:
> > What is the best practice to remove a node and add the same node back for
> > hbase/hadoop ?
> > Currently in our 10 node cluster; 2 nodes went down (bad disk, so node is
> > down as its the root volume+data); need to replace the disk and add them
> > back. Any quick suggestions or pointers to doc for the right procedure ?
> > --
> > David