Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> decommissioning node woes


Copy link to this message
-
Re: decommissioning node woes
If nobody else more qualified is willing to jump in, I can at least provide
some pointers.

What you describe is a bit surprising.  I have zero experience with any 0.21
version, but decommissioning was working well
in much older versions, so this would be a surprising regression.

The observations you have aren't all inconsistent with how decommissioning
should work.  The fact that your nodes look up
after starting the decommissioning isn't so strange.  The idea is that no
new data will be put on the node, nor should it be
counted as a replica, but it will help in reading data.

So that isn't such a big worry.

The fact that it takes forever and a day, however, is a big worry.  I cannot
provide any help there just off hand.

What happens when a datanode goes down?  Do you see under-replicated files?
 Does the number of such files decrease over time?

On Fri, Mar 18, 2011 at 4:23 AM, Rita <[EMAIL PROTECTED]> wrote:

> Any help?
>
>
> On Wed, Mar 16, 2011 at 9:36 PM, Rita <[EMAIL PROTECTED]> wrote:
>
> > Hello,
> >
> > I have been struggling with decommissioning data  nodes. I have a 50+
> data
> > node cluster (no MR) with each server holding about 2TB of storage. I
> split
> > the nodes into 2 racks.
> >
> >
> > I edit the 'exclude' file and then do a -refreshNodes. I see the node
> > immediate in 'Decommiosied node' and I also see it as a 'live' node!
> > Eventhough I wait 24+ hours its still like this. I am suspecting its a
> bug
> > in my version.  The data node process is still running on the node I am
> > trying to decommission. So, sometimes I kill -9 the process and I see the
> > 'under replicated' blocks...this can't be the normal procedure.
> >
> > There were even times that I had corrupt blocks because I was impatient
> --
> > waited 24-34 hours
> >
> > I am using 23 August, 2010: release 0.21.0 <
> http://hadoop.apache.org/hdfs/releases.html#23+August%2C+2010%3A+release+0.21.0+available
> >
> >  version.
> >
> > Is this a known bug? Is there anything else I need to do to decommission
> a
> > node?
> >
> >
> >
> >
> >
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
> >
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB