Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hang when add/remove a datanode into/from a 2 datanode cluster


Copy link to this message
-
Re: Hang when add/remove a datanode into/from a 2 datanode cluster
The dfs.replication is a per-file parameter. If you have a client that
does not use the supplied configs, then its default replication is 3
and all files it will create (as part of the app or via a job config)
will be with replication factor 3.

You can do an -lsr to find all files and filter which ones have been
created with a factor of 3 (versus expected config of 2).

On Fri, Jun 21, 2013 at 3:13 PM, sam liu <[EMAIL PROTECTED]> wrote:
> Hi George,
>
> Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But still
> encounter this issue.
>
> Thanks!
>
>
> 2013/6/21 George Kousiouris <[EMAIL PROTECTED]>
>>
>>
>> Hi,
>>
>> I think i have faced this before, the problem is that you have the rep
>> factor=3 so it seems to hang because it needs 3 nodes to achieve the factor
>> (replicas are not created on the same node). If you set the replication
>> factor=2 i think you will not have this issue. So in general you must make
>> sure that the rep factor is <= to the available datanodes.
>>
>> BR,
>> George
>>
>>
>> On 6/21/2013 12:29 PM, sam liu wrote:
>>
>> Hi,
>>
>> I encountered an issue which hangs the decommission operatoin. Its steps:
>> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
>> hdfs-site.xml, set the 'dfs.replication' to 2
>> 2. Add node dn3 into the cluster as a new datanode, and did not change the
>> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> note: step 2 passed
>> 3. Decommission dn3 from the cluster
>>
>> Expected result: dn3 could be decommissioned successfully
>>
>> Actual result: decommission progress hangs and the status always be
>> 'Waiting DataNode status: Decommissioned'
>>
>> However, if the initial cluster includes >= 3 datanodes, this issue won't
>> be encountered when add/remove another datanode.
>>
>> Also, after step 2, I noticed that some block's expected replicas is 3,
>> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>>
>> Could anyone pls help provide some triages?
>>
>> Thanks in advance!
>>
>>
>>
>> --
>> ---------------------------
>>
>> George Kousiouris, PhD
>> Electrical and Computer Engineer
>> Division of Communications,
>> Electronics and Information Engineering
>> School of Electrical and Computer Engineering
>> Tel: +30 210 772 2546
>> Mobile: +30 6939354121
>> Fax: +30 210 772 2569
>> Email: [EMAIL PROTECTED]
>> Site: http://users.ntua.gr/gkousiou/
>>
>> National Technical University of Athens
>> 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
>
>

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB