|
Chandra Mohan, Ananda Vel...
2012-08-07, 00:59
Michael Segel
2012-08-07, 01:06
Chandra Mohan, Ananda Vel...
2012-08-07, 01:15
Michael Segel
2012-08-07, 01:26
Chandra Mohan, Ananda Vel...
2012-08-07, 04:17
Martin Gerlach
2012-08-08, 15:35
Martin Gerlach
2012-08-08, 15:37
Chandra Mohan, Ananda Vel...
2012-08-08, 15:41
|
-
Decommisioning runs for everChandra Mohan, Ananda Vel... 2012-08-07, 00:59
Hi,
I tried decommissioning a node in my Hadoop cluster. I am running Apache Hadoop 1.0.2 and ours is a four node cluster. I also have HBase installed in my cluster. I have shut down region server in this node. For decommissioning, I did the following steps * Added the following XML in hdfs-site.xml <property> <name>dfs.hosts.exclude</name> <value>/full/path/of/host/exclude/file</value> </property> * Ran "<HADOOP_HOME>/bin/hadoop dfsadmin -refreshNodes" But node decommissioning is running for the last 6 hrs. I don't know when it will get over. I am in need of this node for other activities. >From HDFS health status JSP: Cluster Summary 338 files and directories, 200 blocks = 538 total. Heap Size is 16.62 MB / 888.94 MB (1%) Configured Capacity : 1.35 TB DFS Used : 759.57 MB Non DFS Used : 179.36 GB DFS Remaining : 1.17 TB DFS Used% : 0.05 % DFS Remaining% : 86.92 % Live Nodes : 4 Dead Nodes : 0 Decommissioning Nodes : 1 Number of Under-Replicated Blocks : 129 Please share if you have any idea. Thanks a lot. Regards, Anand.C
-
Re: Decommisioning runs for everMichael Segel 2012-08-07, 01:06
Did you change the background bandwidth from 10mbs to something higher?
Worst case is that you can kill the DN and wait 10 mins for the cluster to realize its down and then rebalance. (Its ugly, but it works.) On Aug 6, 2012, at 7:59 PM, "Chandra Mohan, Ananda Vel Murugan" <[EMAIL PROTECTED]> wrote: > Hi, > > I tried decommissioning a node in my Hadoop cluster. I am running Apache Hadoop 1.0.2 and ours is a four node cluster. I also have HBase installed in my cluster. I have shut down region server in this node. > > For decommissioning, I did the following steps > > > * Added the following XML in hdfs-site.xml > > <property> > > <name>dfs.hosts.exclude</name> > > <value>/full/path/of/host/exclude/file</value> > > </property> > > > * Ran "<HADOOP_HOME>/bin/hadoop dfsadmin -refreshNodes" > > > > But node decommissioning is running for the last 6 hrs. I don't know when it will get over. I am in need of this node for other activities. > > > > From HDFS health status JSP: > > Cluster Summary > 338 files and directories, 200 blocks = 538 total. Heap Size is 16.62 MB / 888.94 MB (1%) > Configured Capacity > > : > > 1.35 TB > > DFS Used > > : > > 759.57 MB > > Non DFS Used > > : > > 179.36 GB > > DFS Remaining > > : > > 1.17 TB > > DFS Used% > > : > > 0.05 % > > DFS Remaining% > > : > > 86.92 % > > Live Nodes > > : > > 4 > > Dead Nodes > > : > > 0 > > Decommissioning Nodes > > : > > 1 > > Number of Under-Replicated Blocks > > : > > 129 > > > > > Please share if you have any idea. Thanks a lot. > > > > Regards, > > Anand.C > >
-
RE: Decommisioning runs for everChandra Mohan, Ananda Vel... 2012-08-07, 01:15
Are you referring to this setting dfs.balance.bandwidthPerSec ? -----Original Message----- From: Michael Segel [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 07, 2012 6:36 AM To: [EMAIL PROTECTED] Subject: Re: Decommisioning runs for ever Did you change the background bandwidth from 10mbs to something higher? Worst case is that you can kill the DN and wait 10 mins for the cluster to realize its down and then rebalance. (Its ugly, but it works.) On Aug 6, 2012, at 7:59 PM, "Chandra Mohan, Ananda Vel Murugan" <[EMAIL PROTECTED]> wrote: > Hi, > > I tried decommissioning a node in my Hadoop cluster. I am running Apache Hadoop 1.0.2 and ours is a four node cluster. I also have HBase installed in my cluster. I have shut down region server in this node. > > For decommissioning, I did the following steps > > > * Added the following XML in hdfs-site.xml > > <property> > > <name>dfs.hosts.exclude</name> > > <value>/full/path/of/host/exclude/file</value> > > </property> > > > * Ran "<HADOOP_HOME>/bin/hadoop dfsadmin -refreshNodes" > > > > But node decommissioning is running for the last 6 hrs. I don't know when it will get over. I am in need of this node for other activities. > > > > From HDFS health status JSP: > > Cluster Summary > 338 files and directories, 200 blocks = 538 total. Heap Size is 16.62 MB / 888.94 MB (1%) > Configured Capacity > > : > > 1.35 TB > > DFS Used > > : > > 759.57 MB > > Non DFS Used > > : > > 179.36 GB > > DFS Remaining > > : > > 1.17 TB > > DFS Used% > > : > > 0.05 % > > DFS Remaining% > > : > > 86.92 % > > Live Nodes > > : > > 4 > > Dead Nodes > > : > > 0 > > Decommissioning Nodes > > : > > 1 > > Number of Under-Replicated Blocks > > : > > 129 > > > > > Please share if you have any idea. Thanks a lot. > > > > Regards, > > Anand.C > >
-
Re: Decommisioning runs for everMichael Segel 2012-08-07, 01:26
Yup.
By default it looks like 10MB/Sec. With 1GBe, you could probably push this up to 100MB/Sec or even higher depending on your cluster usage. 10GBe... obviously higher. On Aug 6, 2012, at 8:15 PM, "Chandra Mohan, Ananda Vel Murugan" <[EMAIL PROTECTED]> wrote: > > Are you referring to this setting dfs.balance.bandwidthPerSec ? > > > -----Original Message----- > From: Michael Segel [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, August 07, 2012 6:36 AM > To: [EMAIL PROTECTED] > Subject: Re: Decommisioning runs for ever > > Did you change the background bandwidth from 10mbs to something higher? > Worst case is that you can kill the DN and wait 10 mins for the cluster to realize its down and then rebalance. > (Its ugly, but it works.) > > On Aug 6, 2012, at 7:59 PM, "Chandra Mohan, Ananda Vel Murugan" <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> I tried decommissioning a node in my Hadoop cluster. I am running Apache Hadoop 1.0.2 and ours is a four node cluster. I also have HBase installed in my cluster. I have shut down region server in this node. >> >> For decommissioning, I did the following steps >> >> >> * Added the following XML in hdfs-site.xml >> >> <property> >> >> <name>dfs.hosts.exclude</name> >> >> <value>/full/path/of/host/exclude/file</value> >> >> </property> >> >> >> * Ran "<HADOOP_HOME>/bin/hadoop dfsadmin -refreshNodes" >> >> >> >> But node decommissioning is running for the last 6 hrs. I don't know when it will get over. I am in need of this node for other activities. >> >> >> >> From HDFS health status JSP: >> >> Cluster Summary >> 338 files and directories, 200 blocks = 538 total. Heap Size is 16.62 MB / 888.94 MB (1%) >> Configured Capacity >> >> : >> >> 1.35 TB >> >> DFS Used >> >> : >> >> 759.57 MB >> >> Non DFS Used >> >> : >> >> 179.36 GB >> >> DFS Remaining >> >> : >> >> 1.17 TB >> >> DFS Used% >> >> : >> >> 0.05 % >> >> DFS Remaining% >> >> : >> >> 86.92 % >> >> Live Nodes >> >> : >> >> 4 >> >> Dead Nodes >> >> : >> >> 0 >> >> Decommissioning Nodes >> >> : >> >> 1 >> >> Number of Under-Replicated Blocks >> >> : >> >> 129 >> >> >> >> >> Please share if you have any idea. Thanks a lot. >> >> >> >> Regards, >> >> Anand.C >> >> > >
-
RE: Decommisioning runs for everChandra Mohan, Ananda Vel... 2012-08-07, 04:17
Hi, I had to go for ugly method after trying bandwidth settings. Now the node which I wanted to be decommissioned is dead. Should I restart the cluster now? Should I update the slaves file before restarting? Anything else that I should take care of? Regards, Anand.C -----Original Message----- From: Michael Segel [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 07, 2012 6:57 AM To: [EMAIL PROTECTED] Subject: Re: Decommisioning runs for ever Yup. By default it looks like 10MB/Sec. With 1GBe, you could probably push this up to 100MB/Sec or even higher depending on your cluster usage. 10GBe... obviously higher. On Aug 6, 2012, at 8:15 PM, "Chandra Mohan, Ananda Vel Murugan" <[EMAIL PROTECTED]> wrote: > > Are you referring to this setting dfs.balance.bandwidthPerSec ? > > > -----Original Message----- > From: Michael Segel [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, August 07, 2012 6:36 AM > To: [EMAIL PROTECTED] > Subject: Re: Decommisioning runs for ever > > Did you change the background bandwidth from 10mbs to something higher? > Worst case is that you can kill the DN and wait 10 mins for the cluster to realize its down and then rebalance. > (Its ugly, but it works.) > > On Aug 6, 2012, at 7:59 PM, "Chandra Mohan, Ananda Vel Murugan" <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> I tried decommissioning a node in my Hadoop cluster. I am running Apache Hadoop 1.0.2 and ours is a four node cluster. I also have HBase installed in my cluster. I have shut down region server in this node. >> >> For decommissioning, I did the following steps >> >> >> * Added the following XML in hdfs-site.xml >> >> <property> >> >> <name>dfs.hosts.exclude</name> >> >> <value>/full/path/of/host/exclude/file</value> >> >> </property> >> >> >> * Ran "<HADOOP_HOME>/bin/hadoop dfsadmin -refreshNodes" >> >> >> >> But node decommissioning is running for the last 6 hrs. I don't know when it will get over. I am in need of this node for other activities. >> >> >> >> From HDFS health status JSP: >> >> Cluster Summary >> 338 files and directories, 200 blocks = 538 total. Heap Size is 16.62 MB / 888.94 MB (1%) >> Configured Capacity >> >> : >> >> 1.35 TB >> >> DFS Used >> >> : >> >> 759.57 MB >> >> Non DFS Used >> >> : >> >> 179.36 GB >> >> DFS Remaining >> >> : >> >> 1.17 TB >> >> DFS Used% >> >> : >> >> 0.05 % >> >> DFS Remaining% >> >> : >> >> 86.92 % >> >> Live Nodes >> >> : >> >> 4 >> >> Dead Nodes >> >> : >> >> 0 >> >> Decommissioning Nodes >> >> : >> >> 1 >> >> Number of Under-Replicated Blocks >> >> : >> >> 129 >> >> >> >> >> Please share if you have any idea. Thanks a lot. >> >> >> >> Regards, >> >> Anand.C >> >> > >
-
Re: Decommisioning runs for everMartin Gerlach 2012-08-08, 15:35
Hi,
Sometimes, when decomissioning hangs with a number of underreplicated blocks, the reason is that there are blocks/files with a replication value greater than the number of data nodes in the cluster. This can be fixed by running "sudo -u hdfs fsck /" to find the files, then "hadoop fs -setrep -R <num> <path>" to set the replication to a value <num> which is less than or equal to the number of remaining data nodes. Cheers, Martin On 07.08.2012 03:26, Michael Segel wrote: > Yup. > > By default it looks like 10MB/Sec. > With 1GBe, you could probably push this up to 100MB/Sec or even higher depending on your cluster usage. > 10GBe... obviously higher. > > > On Aug 6, 2012, at 8:15 PM, "Chandra Mohan, Ananda Vel Murugan" <[EMAIL PROTECTED]> wrote: > >> Are you referring to this setting dfs.balance.bandwidthPerSec ? >> >> >> -----Original Message----- >> From: Michael Segel [mailto:[EMAIL PROTECTED]] >> Sent: Tuesday, August 07, 2012 6:36 AM >> To: [EMAIL PROTECTED] >> Subject: Re: Decommisioning runs for ever >> >> Did you change the background bandwidth from 10mbs to something higher? >> Worst case is that you can kill the DN and wait 10 mins for the cluster to realize its down and then rebalance. >> (Its ugly, but it works.) >> >> On Aug 6, 2012, at 7:59 PM, "Chandra Mohan, Ananda Vel Murugan" <[EMAIL PROTECTED]> wrote: >> >>> Hi, >>> >>> I tried decommissioning a node in my Hadoop cluster. I am running Apache Hadoop 1.0.2 and ours is a four node cluster. I also have HBase installed in my cluster. I have shut down region server in this node. >>> >>> For decommissioning, I did the following steps >>> >>> >>> * Added the following XML in hdfs-site.xml >>> >>> <property> >>> >>> <name>dfs.hosts.exclude</name> >>> >>> <value>/full/path/of/host/exclude/file</value> >>> >>> </property> >>> >>> >>> * Ran "<HADOOP_HOME>/bin/hadoop dfsadmin -refreshNodes" >>> >>> >>> >>> But node decommissioning is running for the last 6 hrs. I don't know when it will get over. I am in need of this node for other activities. >>> >>> >>> >>> From HDFS health status JSP: >>> >>> Cluster Summary >>> 338 files and directories, 200 blocks = 538 total. Heap Size is 16.62 MB / 888.94 MB (1%) >>> Configured Capacity >>> >>> : >>> >>> 1.35 TB >>> >>> DFS Used >>> >>> : >>> >>> 759.57 MB >>> >>> Non DFS Used >>> >>> : >>> >>> 179.36 GB >>> >>> DFS Remaining >>> >>> : >>> >>> 1.17 TB >>> >>> DFS Used% >>> >>> : >>> >>> 0.05 % >>> >>> DFS Remaining% >>> >>> : >>> >>> 86.92 % >>> >>> Live Nodes >>> >>> : >>> >>> 4 >>> >>> Dead Nodes >>> >>> : >>> >>> 0 >>> >>> Decommissioning Nodes >>> >>> : >>> >>> 1 >>> >>> Number of Under-Replicated Blocks >>> >>> : >>> >>> 129 >>> >>> >>> >>> >>> Please share if you have any idea. Thanks a lot. >>> >>> >>> >>> Regards, >>> >>> Anand.C >>> >>> >>
-
Re: Decommisioning runs for everMartin Gerlach 2012-08-08, 15:37
> This can be fixed by running "sudo -u hdfs fsck /" to find the files, > then "hadoop fs -setrep -R <num> <path>" to set the replication to a > value <num> which is less than or equal to the number of remaining > data nodes. > Sorry, meant "sudo -u hdfs hadoop fsck /", then "sudo -u hdfs hadoop -setrep -R <num> <path>" > Cheers, > Martin > > On 07.08.2012 03:26, Michael Segel wrote: >> Yup. >> >> By default it looks like 10MB/Sec. >> With 1GBe, you could probably push this up to 100MB/Sec or even >> higher depending on your cluster usage. >> 10GBe... obviously higher. >> >> >> On Aug 6, 2012, at 8:15 PM, "Chandra Mohan, Ananda Vel Murugan" >> <[EMAIL PROTECTED]> wrote: >> >>> Are you referring to this setting dfs.balance.bandwidthPerSec ? >>> >>> >>> -----Original Message----- >>> From: Michael Segel [mailto:[EMAIL PROTECTED]] >>> Sent: Tuesday, August 07, 2012 6:36 AM >>> To: [EMAIL PROTECTED] >>> Subject: Re: Decommisioning runs for ever >>> >>> Did you change the background bandwidth from 10mbs to something higher? >>> Worst case is that you can kill the DN and wait 10 mins for the >>> cluster to realize its down and then rebalance. >>> (Its ugly, but it works.) >>> >>> On Aug 6, 2012, at 7:59 PM, "Chandra Mohan, Ananda Vel Murugan" >>> <[EMAIL PROTECTED]> wrote: >>> >>>> Hi, >>>> >>>> I tried decommissioning a node in my Hadoop cluster. I am running >>>> Apache Hadoop 1.0.2 and ours is a four node cluster. I also have >>>> HBase installed in my cluster. I have shut down region server in >>>> this node. >>>> >>>> For decommissioning, I did the following steps >>>> >>>> >>>> * Added the following XML in hdfs-site.xml >>>> >>>> <property> >>>> >>>> <name>dfs.hosts.exclude</name> >>>> >>>> <value>/full/path/of/host/exclude/file</value> >>>> >>>> </property> >>>> >>>> >>>> * Ran "<HADOOP_HOME>/bin/hadoop dfsadmin -refreshNodes" >>>> >>>> >>>> >>>> But node decommissioning is running for the last 6 hrs. I don't >>>> know when it will get over. I am in need of this node for other >>>> activities. >>>> >>>> >>>> >>>> From HDFS health status JSP: >>>> >>>> Cluster Summary >>>> 338 files and directories, 200 blocks = 538 total. Heap Size is >>>> 16.62 MB / 888.94 MB (1%) >>>> Configured Capacity >>>> >>>> : >>>> >>>> 1.35 TB >>>> >>>> DFS Used >>>> >>>> : >>>> >>>> 759.57 MB >>>> >>>> Non DFS Used >>>> >>>> : >>>> >>>> 179.36 GB >>>> >>>> DFS Remaining >>>> >>>> : >>>> >>>> 1.17 TB >>>> >>>> DFS Used% >>>> >>>> : >>>> >>>> 0.05 % >>>> >>>> DFS Remaining% >>>> >>>> : >>>> >>>> 86.92 % >>>> >>>> Live Nodes >>>> >>>> : >>>> >>>> 4 >>>> >>>> Dead Nodes >>>> >>>> : >>>> >>>> 0 >>>> >>>> Decommissioning Nodes >>>> >>>> : >>>> >>>> 1 >>>> >>>> Number of Under-Replicated Blocks >>>> >>>> : >>>> >>>> 129 >>>> >>>> >>>> >>>> >>>> Please share if you have any idea. Thanks a lot. >>>> >>>> >>>> >>>> Regards, >>>> >>>> Anand.C >>>> >>>> >>> > > -- ------------------------------------------------------------------------ *Martin Gerlach* Software Architect Research Neofonie GmbH Robert-Koch-Platz 4 10115 Berlin T +49.30 24627 *413* F +49.30 24627 120 [EMAIL PROTECTED] http://www.neofonie.de Handelsregister Berlin-Charlottenburg: HRB 67460 Gesch�ftsf�hrung Thomas Kitlitschko
-
RE: Decommisioning runs for everChandra Mohan, Ananda Vel... 2012-08-08, 15:41
Thanks Martin. I will follow this tip in future.
-----Original Message----- From: Martin Gerlach [mailto:[EMAIL PROTECTED]] Sent: Wednesday, August 08, 2012 9:07 PM To: [EMAIL PROTECTED] Subject: Re: Decommisioning runs for ever > This can be fixed by running "sudo -u hdfs fsck /" to find the files, > then "hadoop fs -setrep -R <num> <path>" to set the replication to a > value <num> which is less than or equal to the number of remaining > data nodes. > Sorry, meant "sudo -u hdfs hadoop fsck /", then "sudo -u hdfs hadoop -setrep -R <num> <path>" > Cheers, > Martin > > On 07.08.2012 03:26, Michael Segel wrote: >> Yup. >> >> By default it looks like 10MB/Sec. >> With 1GBe, you could probably push this up to 100MB/Sec or even >> higher depending on your cluster usage. >> 10GBe... obviously higher. >> >> >> On Aug 6, 2012, at 8:15 PM, "Chandra Mohan, Ananda Vel Murugan" >> <[EMAIL PROTECTED]> wrote: >> >>> Are you referring to this setting dfs.balance.bandwidthPerSec ? >>> >>> >>> -----Original Message----- >>> From: Michael Segel [mailto:[EMAIL PROTECTED]] >>> Sent: Tuesday, August 07, 2012 6:36 AM >>> To: [EMAIL PROTECTED] >>> Subject: Re: Decommisioning runs for ever >>> >>> Did you change the background bandwidth from 10mbs to something higher? >>> Worst case is that you can kill the DN and wait 10 mins for the >>> cluster to realize its down and then rebalance. >>> (Its ugly, but it works.) >>> >>> On Aug 6, 2012, at 7:59 PM, "Chandra Mohan, Ananda Vel Murugan" >>> <[EMAIL PROTECTED]> wrote: >>> >>>> Hi, >>>> >>>> I tried decommissioning a node in my Hadoop cluster. I am running >>>> Apache Hadoop 1.0.2 and ours is a four node cluster. I also have >>>> HBase installed in my cluster. I have shut down region server in >>>> this node. >>>> >>>> For decommissioning, I did the following steps >>>> >>>> >>>> * Added the following XML in hdfs-site.xml >>>> >>>> <property> >>>> >>>> <name>dfs.hosts.exclude</name> >>>> >>>> <value>/full/path/of/host/exclude/file</value> >>>> >>>> </property> >>>> >>>> >>>> * Ran "<HADOOP_HOME>/bin/hadoop dfsadmin -refreshNodes" >>>> >>>> >>>> >>>> But node decommissioning is running for the last 6 hrs. I don't >>>> know when it will get over. I am in need of this node for other >>>> activities. >>>> >>>> >>>> >>>> From HDFS health status JSP: >>>> >>>> Cluster Summary >>>> 338 files and directories, 200 blocks = 538 total. Heap Size is >>>> 16.62 MB / 888.94 MB (1%) >>>> Configured Capacity >>>> >>>> : >>>> >>>> 1.35 TB >>>> >>>> DFS Used >>>> >>>> : >>>> >>>> 759.57 MB >>>> >>>> Non DFS Used >>>> >>>> : >>>> >>>> 179.36 GB >>>> >>>> DFS Remaining >>>> >>>> : >>>> >>>> 1.17 TB >>>> >>>> DFS Used% >>>> >>>> : >>>> >>>> 0.05 % >>>> >>>> DFS Remaining% >>>> >>>> : >>>> >>>> 86.92 % >>>> >>>> Live Nodes >>>> >>>> : >>>> >>>> 4 >>>> >>>> Dead Nodes >>>> >>>> : >>>> >>>> 0 >>>> >>>> Decommissioning Nodes >>>> >>>> : >>>> >>>> 1 >>>> >>>> Number of Under-Replicated Blocks >>>> >>>> : >>>> >>>> 129 >>>> >>>> >>>> >>>> >>>> Please share if you have any idea. Thanks a lot. >>>> >>>> >>>> >>>> Regards, >>>> >>>> Anand.C >>>> >>>> >>> > > -- ------------------------------------------------------------------------ *Martin Gerlach* Software Architect Research Neofonie GmbH Robert-Koch-Platz 4 10115 Berlin T +49.30 24627 *413* F +49.30 24627 120 [EMAIL PROTECTED] http://www.neofonie.de Handelsregister Berlin-Charlottenburg: HRB 67460 Geschäftsführung Thomas Kitlitschko |