Hadoop, mail # user - Re: Decommissioning Nodes in Production Cluster.

Robert Molina 2013-02-12, 18:13
shashwat shriparv 2013-02-12, 18:22
Re: Decommissioning Nodes in Production Cluster.
sudhakara st 2013-02-12, 15:30
The decommissioning process is controlled by an exclude file, which for
HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*mapred.hosts.exclude
* property. In most cases, there is one shared file,referred to as the
exclude file.This  exclude file name should be specified as a configuration
parameter *dfs.hosts.exclude *in the name node start up.
To remove nodes from the cluster:
1. Add the network addresses of the nodes to be decommissioned to the
exclude file.

2. Restart the MapReduce cluster to stop the tasktrackers on the nodes being
3. Update the namenode with the new set of permitted datanodes, with this
% hadoop dfsadmin -refreshNodes
4. Go to the web UI and check whether the admin state has changed to
In Progress” for the datanodes being decommissioned. They will start copying
their blocks to other datanodes in the cluster.

5. When all the datanodes report their state as “Decommissioned,” then all
the blocks
have been replicated. Shut down the decommissioned nodes.
6. Remove the nodes from the include file, and run:
% hadoop dfsadmin -refreshNodes
7. Remove the nodes from the slaves file.

 Decommission data nodes in small percentage(less than 2%) at time don't
cause any effect on cluster. But it better to pause MR-Jobs before you
triggering Decommission to ensure  no task running in decommissioning
subjected nodes.
 If very small percentage of task running in the decommissioning node it
can submit to other task tracker, but percentage queued jobs  larger then
threshold  then there is chance of job failure. Once triggering the 'hadoop
dfsadmin -refreshNodes' command and decommission started, you can resume
the MR jobs.

*Source : The Definitive Guide [Tom White]*

On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan
> Hi Guys,
> It's recommenced do with removing one the datanode in production cluster.
> via Decommission the particular datanode. please guide me.
> -Dhanasekaran,
Benjamin Kim 2013-02-12, 16:46