Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Hang when add/remove a datanode into/from a 2 datanode cluster


Copy link to this message
-
Re: Hang when add/remove a datanode into/from a 2 datanode cluster
Yes, the default replication factor is 3. However, in my case, it's
strange: during decommission hangs, I found some block's expected replicas
is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
node is always 2 from the beginning of cluster setup. Below is my steps:
1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
hdfs-site.xml, set the 'dfs.replication' to 2
2. Add node dn3 into the cluster as a new datanode, and did not change the '
dfs.replication' value in hdfs-site.xml and keep it as 2
note: step 2 passed
3. Decommission dn3 from the cluster
Expected result: dn3 could be decommissioned successfully
Actual result:
a). decommission progress hangs and the status always be 'Waiting DataNode
status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
decommission continues and will be completed finally.
b). However, if the initial cluster includes >= 3 datanodes, this issue
won't be encountered when add/remove another datanode. For example, if I
setup a cluster with 3 datanodes, and then I can successfully add the 4th
datanode into it, and then also can successfully remove the 4th datanode
from the cluster.

I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
comments?

Thanks!
2013/6/21 Harsh J <[EMAIL PROTECTED]>

> The dfs.replication is a per-file parameter. If you have a client that
> does not use the supplied configs, then its default replication is 3
> and all files it will create (as part of the app or via a job config)
> will be with replication factor 3.
>
> You can do an -lsr to find all files and filter which ones have been
> created with a factor of 3 (versus expected config of 2).
>
> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <[EMAIL PROTECTED]> wrote:
> > Hi George,
> >
> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
> still
> > encounter this issue.
> >
> > Thanks!
> >
> >
> > 2013/6/21 George Kousiouris <[EMAIL PROTECTED]>
> >>
> >>
> >> Hi,
> >>
> >> I think i have faced this before, the problem is that you have the rep
> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
> factor
> >> (replicas are not created on the same node). If you set the replication
> >> factor=2 i think you will not have this issue. So in general you must
> make
> >> sure that the rep factor is <= to the available datanodes.
> >>
> >> BR,
> >> George
> >>
> >>
> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >>
> >> Hi,
> >>
> >> I encountered an issue which hangs the decommission operatoin. Its
> steps:
> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in
> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
> the
> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> note: step 2 passed
> >> 3. Decommission dn3 from the cluster
> >>
> >> Expected result: dn3 could be decommissioned successfully
> >>
> >> Actual result: decommission progress hangs and the status always be
> >> 'Waiting DataNode status: Decommissioned'
> >>
> >> However, if the initial cluster includes >= 3 datanodes, this issue
> won't
> >> be encountered when add/remove another datanode.
> >>
> >> Also, after step 2, I noticed that some block's expected replicas is 3,
> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
> >>
> >> Could anyone pls help provide some triages?
> >>
> >> Thanks in advance!
> >>
> >>
> >>
> >> --
> >> ---------------------------
> >>
> >> George Kousiouris, PhD
> >> Electrical and Computer Engineer
> >> Division of Communications,
> >> Electronics and Information Engineering
> >> School of Electrical and Computer Engineering
> >> Tel: +30 210 772 2546
> >> Mobile: +30 6939354121
> >> Fax: +30 210 772 2569
> >> Email: [EMAIL PROTECTED]
> >> Site: http://users.ntua.gr/gkousiou/
> >>
> >> National Technical University of Athens
> >> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB