Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: Hang when add/remove a datanode into/from a 2 datanode cluster


Copy link to this message
-
Re: Hang when add/remove a datanode into/from a 2 datanode cluster
sam liu 2013-07-31, 06:39
I opened a jira for tracking this issue:
https://issues.apache.org/jira/browse/HDFS-5046
2013/7/2 sam liu <[EMAIL PROTECTED]>

> Yes, the default replication factor is 3. However, in my case, it's
> strange: during decommission hangs, I found some block's expected replicas
> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
> node is always 2 from the beginning of cluster setup. Below is my steps:
>
> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
> hdfs-site.xml, set the 'dfs.replication' to 2
> 2. Add node dn3 into the cluster as a new datanode, and did not change the
> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> note: step 2 passed
>  3. Decommission dn3 from the cluster
> Expected result: dn3 could be decommissioned successfully
> Actual result:
> a). decommission progress hangs and the status always be 'Waiting DataNode
> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
> decommission continues and will be completed finally.
> b). However, if the initial cluster includes >= 3 datanodes, this issue
> won't be encountered when add/remove another datanode. For example, if I
> setup a cluster with 3 datanodes, and then I can successfully add the 4th
> datanode into it, and then also can successfully remove the 4th datanode
> from the cluster.
>
> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
> comments?
>
> Thanks!
>
>
> 2013/6/21 Harsh J <[EMAIL PROTECTED]>
>
>> The dfs.replication is a per-file parameter. If you have a client that
>> does not use the supplied configs, then its default replication is 3
>> and all files it will create (as part of the app or via a job config)
>> will be with replication factor 3.
>>
>> You can do an -lsr to find all files and filter which ones have been
>> created with a factor of 3 (versus expected config of 2).
>>
>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <[EMAIL PROTECTED]> wrote:
>> > Hi George,
>> >
>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But
>> still
>> > encounter this issue.
>> >
>> > Thanks!
>> >
>> >
>> > 2013/6/21 George Kousiouris <[EMAIL PROTECTED]>
>> >>
>> >>
>> >> Hi,
>> >>
>> >> I think i have faced this before, the problem is that you have the rep
>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
>> factor
>> >> (replicas are not created on the same node). If you set the replication
>> >> factor=2 i think you will not have this issue. So in general you must
>> make
>> >> sure that the rep factor is <= to the available datanodes.
>> >>
>> >> BR,
>> >> George
>> >>
>> >>
>> >> On 6/21/2013 12:29 PM, sam liu wrote:
>> >>
>> >> Hi,
>> >>
>> >> I encountered an issue which hangs the decommission operatoin. Its
>> steps:
>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
>> in
>> >> hdfs-site.xml, set the 'dfs.replication' to 2
>> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
>> the
>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
>> >> note: step 2 passed
>> >> 3. Decommission dn3 from the cluster
>> >>
>> >> Expected result: dn3 could be decommissioned successfully
>> >>
>> >> Actual result: decommission progress hangs and the status always be
>> >> 'Waiting DataNode status: Decommissioned'
>> >>
>> >> However, if the initial cluster includes >= 3 datanodes, this issue
>> won't
>> >> be encountered when add/remove another datanode.
>> >>
>> >> Also, after step 2, I noticed that some block's expected replicas is 3,
>> >> but the 'dfs.replication' value in hdfs-site.xml is always 2!
>> >>
>> >> Could anyone pls help provide some triages?
>> >>
>> >> Thanks in advance!
>> >>
>> >>
>> >>
>> >> --
>> >> ---------------------------
>> >>
>> >> George Kousiouris, PhD
>> >> Electrical and Computer Engineer
>> >> Division of Communications,
>> >> Electronics and Information Engineering
>> >> School of Electrical and Computer Engineering