Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> decommissioning node woes


Copy link to this message
-
Re: decommissioning node woes
Unless the last copy is on that node.

Decommissioning is the only safe way to shut off 10 nodes at once.  Doing
them one at a time and waiting for replication to (asymptotically) recover
is painful and error prone.

On Fri, Mar 18, 2011 at 9:08 AM, James Seigel <[EMAIL PROTECTED]> wrote:

> Just a note.  If you just shut the node off, the blocks will replicate
> faster.
>
> James.
>
>
> On 2011-03-18, at 10:03 AM, Ted Dunning wrote:
>
> > If nobody else more qualified is willing to jump in, I can at least
> provide
> > some pointers.
> >
> > What you describe is a bit surprising.  I have zero experience with any
> 0.21
> > version, but decommissioning was working well
> > in much older versions, so this would be a surprising regression.
> >
> > The observations you have aren't all inconsistent with how
> decommissioning
> > should work.  The fact that your nodes look up
> > after starting the decommissioning isn't so strange.  The idea is that no
> > new data will be put on the node, nor should it be
> > counted as a replica, but it will help in reading data.
> >
> > So that isn't such a big worry.
> >
> > The fact that it takes forever and a day, however, is a big worry.  I
> cannot
> > provide any help there just off hand.
> >
> > What happens when a datanode goes down?  Do you see under-replicated
> files?
> > Does the number of such files decrease over time?
> >
> > On Fri, Mar 18, 2011 at 4:23 AM, Rita <[EMAIL PROTECTED]> wrote:
> >
> >> Any help?
> >>
> >>
> >> On Wed, Mar 16, 2011 at 9:36 PM, Rita <[EMAIL PROTECTED]> wrote:
> >>
> >>> Hello,
> >>>
> >>> I have been struggling with decommissioning data  nodes. I have a 50+
> >> data
> >>> node cluster (no MR) with each server holding about 2TB of storage. I
> >> split
> >>> the nodes into 2 racks.
> >>>
> >>>
> >>> I edit the 'exclude' file and then do a -refreshNodes. I see the node
> >>> immediate in 'Decommiosied node' and I also see it as a 'live' node!
> >>> Eventhough I wait 24+ hours its still like this. I am suspecting its a
> >> bug
> >>> in my version.  The data node process is still running on the node I am
> >>> trying to decommission. So, sometimes I kill -9 the process and I see
> the
> >>> 'under replicated' blocks...this can't be the normal procedure.
> >>>
> >>> There were even times that I had corrupt blocks because I was impatient
> >> --
> >>> waited 24-34 hours
> >>>
> >>> I am using 23 August, 2010: release 0.21.0 <
> >>
> http://hadoop.apache.org/hdfs/releases.html#23+August%2C+2010%3A+release+0.21.0+available
> >>>
> >>> version.
> >>>
> >>> Is this a known bug? Is there anything else I need to do to
> decommission
> >> a
> >>> node?
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> --- Get your facts first, then you can distort them as you please.--
> >>>
> >>
> >>
> >>
> >> --
> >> --- Get your facts first, then you can distort them as you please.--
> >>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB