Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Hang when add/remove a datanode into/from a 2 datanode cluster


+
sam liu 2013-07-31, 06:39
+
Harsh J 2013-07-31, 16:56
Copy link to this message
-
Re: Hang when add/remove a datanode into/from a 2 datanode cluster
But, please mention that the value of 'dfs.replication' of the cluster is
always 2, even when the datanode number is 3. And I am pretty sure I did
not manually create any files with rep=3. So, why were some files of hdfs
created with repl=3, but not repl=2?
2013/8/1 Harsh J <[EMAIL PROTECTED]>

> The step (a) points to your problem and solution both. You have files
> being created with repl=3 on a 2 DN cluster which will prevent
> decommission. This is not a bug.
>
> On Wed, Jul 31, 2013 at 12:09 PM, sam liu <[EMAIL PROTECTED]> wrote:
> > I opened a jira for tracking this issue:
> > https://issues.apache.org/jira/browse/HDFS-5046
> >
> >
> > 2013/7/2 sam liu <[EMAIL PROTECTED]>
> >>
> >> Yes, the default replication factor is 3. However, in my case, it's
> >> strange: during decommission hangs, I found some block's expected
> replicas
> >> is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster
> node
> >> is always 2 from the beginning of cluster setup. Below is my steps:
> >>
> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And,
> in
> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >> 2. Add node dn3 into the cluster as a new datanode, and did not change
> the
> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >> note: step 2 passed
> >> 3. Decommission dn3 from the cluster
> >> Expected result: dn3 could be decommissioned successfully
> >> Actual result:
> >> a). decommission progress hangs and the status always be 'Waiting
> DataNode
> >> status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /',
> the
> >> decommission continues and will be completed finally.
> >> b). However, if the initial cluster includes >= 3 datanodes, this issue
> >> won't be encountered when add/remove another datanode. For example, if I
> >> setup a cluster with 3 datanodes, and then I can successfully add the
> 4th
> >> datanode into it, and then also can successfully remove the 4th datanode
> >> from the cluster.
> >>
> >> I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any
> >> comments?
> >>
> >> Thanks!
> >>
> >>
> >> 2013/6/21 Harsh J <[EMAIL PROTECTED]>
> >>>
> >>> The dfs.replication is a per-file parameter. If you have a client that
> >>> does not use the supplied configs, then its default replication is 3
> >>> and all files it will create (as part of the app or via a job config)
> >>> will be with replication factor 3.
> >>>
> >>> You can do an -lsr to find all files and filter which ones have been
> >>> created with a factor of 3 (versus expected config of 2).
> >>>
> >>> On Fri, Jun 21, 2013 at 3:13 PM, sam liu <[EMAIL PROTECTED]>
> wrote:
> >>> > Hi George,
> >>> >
> >>> > Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2.
> But
> >>> > still
> >>> > encounter this issue.
> >>> >
> >>> > Thanks!
> >>> >
> >>> >
> >>> > 2013/6/21 George Kousiouris <[EMAIL PROTECTED]>
> >>> >>
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I think i have faced this before, the problem is that you have the
> rep
> >>> >> factor=3 so it seems to hang because it needs 3 nodes to achieve the
> >>> >> factor
> >>> >> (replicas are not created on the same node). If you set the
> >>> >> replication
> >>> >> factor=2 i think you will not have this issue. So in general you
> must
> >>> >> make
> >>> >> sure that the rep factor is <= to the available datanodes.
> >>> >>
> >>> >> BR,
> >>> >> George
> >>> >>
> >>> >>
> >>> >> On 6/21/2013 12:29 PM, sam liu wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I encountered an issue which hangs the decommission operatoin. Its
> >>> >> steps:
> >>> >> 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2.
> And,
> >>> >> in
> >>> >> hdfs-site.xml, set the 'dfs.replication' to 2
> >>> >> 2. Add node dn3 into the cluster as a new datanode, and did not
> change
> >>> >> the
> >>> >> 'dfs.replication' value in hdfs-site.xml and keep it as 2
> >>> >> note: step 2 passed
> >>> >> 3. Decommission dn3 from the cluster
+
Harsh J 2013-08-01, 03:11
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB