|
Rita
2011-03-17, 01:36
Rita
2011-03-18, 11:23
Ted Dunning
2011-03-18, 16:03
James Seigel
2011-03-18, 16:08
Ted Dunning
2011-03-18, 16:38
Michael Segel
2011-03-18, 16:59
Ted Dunning
2011-03-18, 17:34
James Seigel
2011-03-18, 17:39
Steve Loughran
2011-03-18, 17:57
Michael Segel
2011-03-19, 03:30
Ted Dunning
2011-03-19, 16:00
Michael Segel
2011-03-20, 02:11
M. C. Srivas
2011-03-20, 02:52
Steve Loughran
2011-03-21, 10:39
|
-
decommissioning node woesRita 2011-03-17, 01:36
Hello,
I have been struggling with decommissioning data nodes. I have a 50+ data node cluster (no MR) with each server holding about 2TB of storage. I split the nodes into 2 racks. I edit the 'exclude' file and then do a -refreshNodes. I see the node immediate in 'Decommiosied node' and I also see it as a 'live' node! Eventhough I wait 24+ hours its still like this. I am suspecting its a bug in my version. The data node process is still running on the node I am trying to decommission. So, sometimes I kill -9 the process and I see the 'under replicated' blocks...this can't be the normal procedure. There were even times that I had corrupt blocks because I was impatient -- waited 24-34 hours I am using 23 August, 2010: release 0.21.0 <http://hadoop.apache.org/hdfs/releases.html#23+August%2C+2010%3A+release+0.21.0+available> version. Is this a known bug? Is there anything else I need to do to decommission a node? -- --- Get your facts first, then you can distort them as you please.-- +
Rita 2011-03-17, 01:36
-
Re: decommissioning node woesRita 2011-03-18, 11:23
Any help?
On Wed, Mar 16, 2011 at 9:36 PM, Rita <[EMAIL PROTECTED]> wrote: > Hello, > > I have been struggling with decommissioning data nodes. I have a 50+ data > node cluster (no MR) with each server holding about 2TB of storage. I split > the nodes into 2 racks. > > > I edit the 'exclude' file and then do a -refreshNodes. I see the node > immediate in 'Decommiosied node' and I also see it as a 'live' node! > Eventhough I wait 24+ hours its still like this. I am suspecting its a bug > in my version. The data node process is still running on the node I am > trying to decommission. So, sometimes I kill -9 the process and I see the > 'under replicated' blocks...this can't be the normal procedure. > > There were even times that I had corrupt blocks because I was impatient -- > waited 24-34 hours > > I am using 23 August, 2010: release 0.21.0 <http://hadoop.apache.org/hdfs/releases.html#23+August%2C+2010%3A+release+0.21.0+available> > version. > > Is this a known bug? Is there anything else I need to do to decommission a > node? > > > > > > > > -- > --- Get your facts first, then you can distort them as you please.-- > -- --- Get your facts first, then you can distort them as you please.-- +
Rita 2011-03-18, 11:23
-
Re: decommissioning node woesTed Dunning 2011-03-18, 16:03
If nobody else more qualified is willing to jump in, I can at least provide
some pointers. What you describe is a bit surprising. I have zero experience with any 0.21 version, but decommissioning was working well in much older versions, so this would be a surprising regression. The observations you have aren't all inconsistent with how decommissioning should work. The fact that your nodes look up after starting the decommissioning isn't so strange. The idea is that no new data will be put on the node, nor should it be counted as a replica, but it will help in reading data. So that isn't such a big worry. The fact that it takes forever and a day, however, is a big worry. I cannot provide any help there just off hand. What happens when a datanode goes down? Do you see under-replicated files? Does the number of such files decrease over time? On Fri, Mar 18, 2011 at 4:23 AM, Rita <[EMAIL PROTECTED]> wrote: > Any help? > > > On Wed, Mar 16, 2011 at 9:36 PM, Rita <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > I have been struggling with decommissioning data nodes. I have a 50+ > data > > node cluster (no MR) with each server holding about 2TB of storage. I > split > > the nodes into 2 racks. > > > > > > I edit the 'exclude' file and then do a -refreshNodes. I see the node > > immediate in 'Decommiosied node' and I also see it as a 'live' node! > > Eventhough I wait 24+ hours its still like this. I am suspecting its a > bug > > in my version. The data node process is still running on the node I am > > trying to decommission. So, sometimes I kill -9 the process and I see the > > 'under replicated' blocks...this can't be the normal procedure. > > > > There were even times that I had corrupt blocks because I was impatient > -- > > waited 24-34 hours > > > > I am using 23 August, 2010: release 0.21.0 < > http://hadoop.apache.org/hdfs/releases.html#23+August%2C+2010%3A+release+0.21.0+available > > > > version. > > > > Is this a known bug? Is there anything else I need to do to decommission > a > > node? > > > > > > > > > > > > > > > > -- > > --- Get your facts first, then you can distort them as you please.-- > > > > > > -- > --- Get your facts first, then you can distort them as you please.-- > +
Ted Dunning 2011-03-18, 16:03
-
Re: decommissioning node woesJames Seigel 2011-03-18, 16:08
Just a note. If you just shut the node off, the blocks will replicate faster.
James. On 2011-03-18, at 10:03 AM, Ted Dunning wrote: > If nobody else more qualified is willing to jump in, I can at least provide > some pointers. > > What you describe is a bit surprising. I have zero experience with any 0.21 > version, but decommissioning was working well > in much older versions, so this would be a surprising regression. > > The observations you have aren't all inconsistent with how decommissioning > should work. The fact that your nodes look up > after starting the decommissioning isn't so strange. The idea is that no > new data will be put on the node, nor should it be > counted as a replica, but it will help in reading data. > > So that isn't such a big worry. > > The fact that it takes forever and a day, however, is a big worry. I cannot > provide any help there just off hand. > > What happens when a datanode goes down? Do you see under-replicated files? > Does the number of such files decrease over time? > > On Fri, Mar 18, 2011 at 4:23 AM, Rita <[EMAIL PROTECTED]> wrote: > >> Any help? >> >> >> On Wed, Mar 16, 2011 at 9:36 PM, Rita <[EMAIL PROTECTED]> wrote: >> >>> Hello, >>> >>> I have been struggling with decommissioning data nodes. I have a 50+ >> data >>> node cluster (no MR) with each server holding about 2TB of storage. I >> split >>> the nodes into 2 racks. >>> >>> >>> I edit the 'exclude' file and then do a -refreshNodes. I see the node >>> immediate in 'Decommiosied node' and I also see it as a 'live' node! >>> Eventhough I wait 24+ hours its still like this. I am suspecting its a >> bug >>> in my version. The data node process is still running on the node I am >>> trying to decommission. So, sometimes I kill -9 the process and I see the >>> 'under replicated' blocks...this can't be the normal procedure. >>> >>> There were even times that I had corrupt blocks because I was impatient >> -- >>> waited 24-34 hours >>> >>> I am using 23 August, 2010: release 0.21.0 < >> http://hadoop.apache.org/hdfs/releases.html#23+August%2C+2010%3A+release+0.21.0+available >>> >>> version. >>> >>> Is this a known bug? Is there anything else I need to do to decommission >> a >>> node? >>> >>> >>> >>> >>> >>> >>> >>> -- >>> --- Get your facts first, then you can distort them as you please.-- >>> >> >> >> >> -- >> --- Get your facts first, then you can distort them as you please.-- >> +
James Seigel 2011-03-18, 16:08
-
Re: decommissioning node woesTed Dunning 2011-03-18, 16:38
Unless the last copy is on that node.
Decommissioning is the only safe way to shut off 10 nodes at once. Doing them one at a time and waiting for replication to (asymptotically) recover is painful and error prone. On Fri, Mar 18, 2011 at 9:08 AM, James Seigel <[EMAIL PROTECTED]> wrote: > Just a note. If you just shut the node off, the blocks will replicate > faster. > > James. > > > On 2011-03-18, at 10:03 AM, Ted Dunning wrote: > > > If nobody else more qualified is willing to jump in, I can at least > provide > > some pointers. > > > > What you describe is a bit surprising. I have zero experience with any > 0.21 > > version, but decommissioning was working well > > in much older versions, so this would be a surprising regression. > > > > The observations you have aren't all inconsistent with how > decommissioning > > should work. The fact that your nodes look up > > after starting the decommissioning isn't so strange. The idea is that no > > new data will be put on the node, nor should it be > > counted as a replica, but it will help in reading data. > > > > So that isn't such a big worry. > > > > The fact that it takes forever and a day, however, is a big worry. I > cannot > > provide any help there just off hand. > > > > What happens when a datanode goes down? Do you see under-replicated > files? > > Does the number of such files decrease over time? > > > > On Fri, Mar 18, 2011 at 4:23 AM, Rita <[EMAIL PROTECTED]> wrote: > > > >> Any help? > >> > >> > >> On Wed, Mar 16, 2011 at 9:36 PM, Rita <[EMAIL PROTECTED]> wrote: > >> > >>> Hello, > >>> > >>> I have been struggling with decommissioning data nodes. I have a 50+ > >> data > >>> node cluster (no MR) with each server holding about 2TB of storage. I > >> split > >>> the nodes into 2 racks. > >>> > >>> > >>> I edit the 'exclude' file and then do a -refreshNodes. I see the node > >>> immediate in 'Decommiosied node' and I also see it as a 'live' node! > >>> Eventhough I wait 24+ hours its still like this. I am suspecting its a > >> bug > >>> in my version. The data node process is still running on the node I am > >>> trying to decommission. So, sometimes I kill -9 the process and I see > the > >>> 'under replicated' blocks...this can't be the normal procedure. > >>> > >>> There were even times that I had corrupt blocks because I was impatient > >> -- > >>> waited 24-34 hours > >>> > >>> I am using 23 August, 2010: release 0.21.0 < > >> > http://hadoop.apache.org/hdfs/releases.html#23+August%2C+2010%3A+release+0.21.0+available > >>> > >>> version. > >>> > >>> Is this a known bug? Is there anything else I need to do to > decommission > >> a > >>> node? > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> --- Get your facts first, then you can distort them as you please.-- > >>> > >> > >> > >> > >> -- > >> --- Get your facts first, then you can distort them as you please.-- > >> > > +
Ted Dunning 2011-03-18, 16:38
-
RE: decommissioning node woesMichael Segel 2011-03-18, 16:59
Uhmm... If you use the default bandwidth allocation and you have a lot of data on the node you want to decommission you can be waiting for weeks before you can safely take the node out. If you wanted to, you can take the nodes down one by one where you do an fsck in between the removal of nodes to get the under replicated blocks identified and replicated. ("Normally Namenode automatically corrects most of the recoverable failures.") Once you see those blocks successfully replicated... you can take down the next. Is it clean? No, not really. Is it dangerous? No, not really. Do I recommend it? No, but its a quick and dirty way of doing things... Or you can up your dfs.balance.bandwidthPerSecIn the configuration files. The default is pretty low. The downside is that you have to bounce the cloud to get this value updated, and it could have a negative impact on performance if set too high. HTH -Mike > From: [EMAIL PROTECTED] > Date: Fri, 18 Mar 2011 09:38:31 -0700 > Subject: Re: decommissioning node woes > To: [EMAIL PROTECTED] > CC: [EMAIL PROTECTED] > > Unless the last copy is on that node. > > Decommissioning is the only safe way to shut off 10 nodes at once. Doing > them one at a time and waiting for replication to (asymptotically) recover > is painful and error prone. > > On Fri, Mar 18, 2011 at 9:08 AM, James Seigel <[EMAIL PROTECTED]> wrote: > > > Just a note. If you just shut the node off, the blocks will replicate > > faster. > > > > James. > > > > > > On 2011-03-18, at 10:03 AM, Ted Dunning wrote: > > > > > If nobody else more qualified is willing to jump in, I can at least > > provide > > > some pointers. > > > > > > What you describe is a bit surprising. I have zero experience with any > > 0.21 > > > version, but decommissioning was working well > > > in much older versions, so this would be a surprising regression. > > > > > > The observations you have aren't all inconsistent with how > > decommissioning > > > should work. The fact that your nodes look up > > > after starting the decommissioning isn't so strange. The idea is that no > > > new data will be put on the node, nor should it be > > > counted as a replica, but it will help in reading data. > > > > > > So that isn't such a big worry. > > > > > > The fact that it takes forever and a day, however, is a big worry. I > > cannot > > > provide any help there just off hand. > > > > > > What happens when a datanode goes down? Do you see under-replicated > > files? > > > Does the number of such files decrease over time? > > > > > > On Fri, Mar 18, 2011 at 4:23 AM, Rita <[EMAIL PROTECTED]> wrote: > > > > > >> Any help? > > >> > > >> > > >> On Wed, Mar 16, 2011 at 9:36 PM, Rita <[EMAIL PROTECTED]> wrote: > > >> > > >>> Hello, > > >>> > > >>> I have been struggling with decommissioning data nodes. I have a 50+ > > >> data > > >>> node cluster (no MR) with each server holding about 2TB of storage. I > > >> split > > >>> the nodes into 2 racks. > > >>> > > >>> > > >>> I edit the 'exclude' file and then do a -refreshNodes. I see the node > > >>> immediate in 'Decommiosied node' and I also see it as a 'live' node! > > >>> Eventhough I wait 24+ hours its still like this. I am suspecting its a > > >> bug > > >>> in my version. The data node process is still running on the node I am > > >>> trying to decommission. So, sometimes I kill -9 the process and I see > > the > > >>> 'under replicated' blocks...this can't be the normal procedure. > > >>> > > >>> There were even times that I had corrupt blocks because I was impatient > > >> -- > > >>> waited 24-34 hours > > >>> > > >>> I am using 23 August, 2010: release 0.21.0 < > > >> > > http://hadoop.apache.org/hdfs/releases.html#23+August%2C+2010%3A+release+0.21.0+available > > >>> > > >>> version. > > >>> > > >>> Is this a known bug? Is there anything else I need to do to > > decommission > > >> a > > >>> node? > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> -- > > >>> +
Michael Segel 2011-03-18, 16:59
-
Re: decommissioning node woesTed Dunning 2011-03-18, 17:34
I like to keep that rather high. If I am decommissioning nodes, I generally
want them out of the cluster NOW. That is probably a personality defect on my part. On Fri, Mar 18, 2011 at 9:59 AM, Michael Segel <[EMAIL PROTECTED]>wrote: > Once you see those blocks successfully replicated... you can take down the > next. > > Is it clean? No, not really. > Is it dangerous? No, not really. > Do I recommend it? No, but its a quick and dirty way of doing things... > > Or you can up your dfs.balance.bandwidthPerSecIn the configuration files. > The default is pretty low. > > The downside is that you have to bounce the cloud to get this value > updated, and it could have a negative impact on performance if set too high. > +
Ted Dunning 2011-03-18, 17:34
-
Re: decommissioning node woesJames Seigel 2011-03-18, 17:39
I agree.
J On 2011-03-18, at 11:34 AM, Ted Dunning wrote: > I like to keep that rather high. If I am decommissioning nodes, I generally > want them out of the cluster NOW. > > That is probably a personality defect on my part. > > On Fri, Mar 18, 2011 at 9:59 AM, Michael Segel <[EMAIL PROTECTED]>wrote: > >> Once you see those blocks successfully replicated... you can take down the >> next. >> >> Is it clean? No, not really. >> Is it dangerous? No, not really. >> Do I recommend it? No, but its a quick and dirty way of doing things... >> >> Or you can up your dfs.balance.bandwidthPerSecIn the configuration files. >> The default is pretty low. >> >> The downside is that you have to bounce the cloud to get this value >> updated, and it could have a negative impact on performance if set too high. >> +
James Seigel 2011-03-18, 17:39
-
Re: decommissioning node woesSteve Loughran 2011-03-18, 17:57
On 18/03/11 17:34, Ted Dunning wrote:
> I like to keep that rather high. If I am decommissioning nodes, I generally > want them out of the cluster NOW. Depends on your backbone B/W I guess. And how well the switches really work vs claim to work. One thought here, does the decommissioning give priority to blocks that are only replicated on the machine(s) being decommissioned. If not, it's something to consider prioritising. > > That is probably a personality defect on my part. > > On Fri, Mar 18, 2011 at 9:59 AM, Michael Segel<[EMAIL PROTECTED]>wrote: > >> Once you see those blocks successfully replicated... you can take down the >> next. >> >> Is it clean? No, not really. >> Is it dangerous? No, not really. >> Do I recommend it? No, but its a quick and dirty way of doing things... >> >> Or you can up your dfs.balance.bandwidthPerSecIn the configuration files. >> The default is pretty low. >> >> The downside is that you have to bounce the cloud to get this value >> updated, and it could have a negative impact on performance if set too high. >> > +
Steve Loughran 2011-03-18, 17:57
-
RE: decommissioning node woesMichael Segel 2011-03-19, 03:30
Well... When you look at the default value... and compare it to DNs having 7+TB of disk space... The math doesn't look good. If you have 1GBe and a good ToR from Cisco, Blade Networks (now IBM), and a couple of others... they can do it. I had a conversation with a switch provider and he indicated that by the end of this year its realistic to find 10GBe on the motherboard. Then budget 12K (USD) for the ToR. Depending on the model, you can trunk with 1 or more 10GBe or they may have a separate module for trunking/uplinks. (There, prices may be more). But I digress. With a 1GBe port, you could go 100Mbs for the bandwidth limit. If you bond your ports, you could go higher. I guess the point is that many don't realize that they need to up the default limit. HTH -Mike > Date: Fri, 18 Mar 2011 17:57:06 +0000 > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: Re: decommissioning node woes > > On 18/03/11 17:34, Ted Dunning wrote: > > I like to keep that rather high. If I am decommissioning nodes, I generally > > want them out of the cluster NOW. > > > Depends on your backbone B/W I guess. And how well the switches really > work vs claim to work. > > One thought here, does the decommissioning give priority to blocks that > are only replicated on the machine(s) being decommissioned. If not, it's > something to consider prioritising. > > > > > > That is probably a personality defect on my part. > > > > On Fri, Mar 18, 2011 at 9:59 AM, Michael Segel<[EMAIL PROTECTED]>wrote: > > > >> Once you see those blocks successfully replicated... you can take down the > >> next. > >> > >> Is it clean? No, not really. > >> Is it dangerous? No, not really. > >> Do I recommend it? No, but its a quick and dirty way of doing things... > >> > >> Or you can up your dfs.balance.bandwidthPerSecIn the configuration files. > >> The default is pretty low. > >> > >> The downside is that you have to bounce the cloud to get this value > >> updated, and it could have a negative impact on performance if set too high. > >> > > > +
Michael Segel 2011-03-19, 03:30
-
Re: decommissioning node woesTed Dunning 2011-03-19, 16:00
Unfortunately this doesn't help much because it is hard to get the ports to
balance the load. On Fri, Mar 18, 2011 at 8:30 PM, Michael Segel <[EMAIL PROTECTED]>wrote: > With a 1GBe port, you could go 100Mbs for the bandwidth limit. > If you bond your ports, you could go higher. > +
Ted Dunning 2011-03-19, 16:00
-
RE: decommissioning node woesMichael Segel 2011-03-20, 02:11
Usually the port bonding is done at a lower level so that you and your applications see this as a single port. So you don't have to worry about load balancing between the ports. (Or am I missing something?) thx -Mike > From: [EMAIL PROTECTED] > Date: Sat, 19 Mar 2011 09:00:30 -0700 > Subject: Re: decommissioning node woes > To: [EMAIL PROTECTED] > CC: [EMAIL PROTECTED] > > Unfortunately this doesn't help much because it is hard to get the ports to > balance the load. > > On Fri, Mar 18, 2011 at 8:30 PM, Michael Segel <[EMAIL PROTECTED]>wrote: > > > With a 1GBe port, you could go 100Mbs for the bandwidth limit. > > If you bond your ports, you could go higher. > > +
Michael Segel 2011-03-20, 02:11
-
Re: decommissioning node woesM. C. Srivas 2011-03-20, 02:52
All trunking/bonding at the switch (eg, LACP) gives only 1 NIC's worth of
bandwidth point-to-point, even if your boxes all have multiple NICs. It chooses a NIC at connection initiation (via round-robin, or load, or whatever). But once the TCP connection is established, there is no load-balancing -- On Sat, Mar 19, 2011 at 7:11 PM, Michael Segel <[EMAIL PROTECTED]>wrote: > > Usually the port bonding is done at a lower level so that you and your > applications see this as a single port. So you don't have to worry about > load balancing between the ports. > (Or am I missing something?) > > thx > > -Mike > > > > From: [EMAIL PROTECTED] > > Date: Sat, 19 Mar 2011 09:00:30 -0700 > > Subject: Re: decommissioning node woes > > To: [EMAIL PROTECTED] > > CC: [EMAIL PROTECTED] > > > > Unfortunately this doesn't help much because it is hard to get the ports > to > > balance the load. > > > > On Fri, Mar 18, 2011 at 8:30 PM, Michael Segel < > [EMAIL PROTECTED]>wrote: > > > > > With a 1GBe port, you could go 100Mbs for the bandwidth limit. > > > If you bond your ports, you could go higher. > > > > +
M. C. Srivas 2011-03-20, 02:52
-
Re: decommissioning node woesSteve Loughran 2011-03-21, 10:39
On 19/03/11 16:00, Ted Dunning wrote:
> Unfortunately this doesn't help much because it is hard to get the ports to > balance the load. > > On Fri, Mar 18, 2011 at 8:30 PM, Michael Segel<[EMAIL PROTECTED]>wrote: > >> With a 1GBe port, you could go 100Mbs for the bandwidth limit. >> If you bond your ports, you could go higher. >> > Port bonding is possible, its just harder to -set up all the cabling -be sure both ports are fully utilised It's less expensive than 10G ether because those switches cost a lot more, and with 2x1 you can have separate ToR switches for more redundancy. For decommissioning, why not boost the rebalance bandwidth before you trigger the decommission, then drop it afterwards. -steve +
Steve Loughran 2011-03-21, 10:39
|