|
David B. Ritch
2009-09-11, 01:30
Alex Loddengaard
2009-09-11, 01:39
Amandeep Khurana
2009-09-11, 02:07
David B. Ritch
2009-09-11, 03:06
Ted Dunning
2009-09-11, 03:11
Michael Thomas
2009-09-11, 03:16
Ted Dunning
2009-09-11, 03:37
Allen Wittenauer
2009-09-11, 16:23
Edward Capriolo
2009-09-11, 17:56
Boris Shkolnik
2009-09-14, 17:20
|
-
Decommissioning Individual DisksDavid B. Ritch 2009-09-11, 01:30
What do you do with the data on a failing disk when you replace it?
Our support person comes in occasionally, and often replaces several disks when he does. These are disks that have not yet failed, but firmware indicates that failure is imminent. We need to be able to migrate our data off these disks before replacing them. If we were replacing entire servers, we would decommission them - but we have 3 data disks per server. If we were replacing one disk at a time, we wouldn't worry about it (because of redundancy). We can decommission the servers, but moving all the data off of all their disks is a waste. What's the best way to handle this? Thanks! David
-
Re: Decommissioning Individual DisksAlex Loddengaard 2009-09-11, 01:39
Hi David,
Unfortunately there's really no way to do what you're hoping to do in an automatic way. You can move the block files (including their .meta files) from one disk to another. Do this when the datanode daemon is stopped. Then, when you start the datanode daemon, it will scan dfs.data.dir and be totally happy if blocks have moved hard drives. I've never tried to do this myself, but others on the list have suggested this technique for "balancing disks." You could also change your process around a little. It's not too crazy to decommission an entire node, replace one of its disks, then bring it back into the cluster. Seems to me that this is a much saner approach: your ops team will tell you which disk needs replacing. You decommission the node, they replace the disk, you add the node back to the pool. Your call I guess, though. Hope this was helpful. Alex On Thu, Sep 10, 2009 at 6:30 PM, David B. Ritch <[EMAIL PROTECTED]>wrote: > What do you do with the data on a failing disk when you replace it? > > Our support person comes in occasionally, and often replaces several > disks when he does. These are disks that have not yet failed, but > firmware indicates that failure is imminent. We need to be able to > migrate our data off these disks before replacing them. If we were > replacing entire servers, we would decommission them - but we have 3 > data disks per server. If we were replacing one disk at a time, we > wouldn't worry about it (because of redundancy). We can decommission > the servers, but moving all the data off of all their disks is a waste. > > What's the best way to handle this? > > Thanks! > > David >
-
Re: Decommissioning Individual DisksAmandeep Khurana 2009-09-11, 02:07
I think decommissioning the node and replacing the disk is a cleaner
approach. That's what I'd recommend doing as well.. On 9/10/09, Alex Loddengaard <[EMAIL PROTECTED]> wrote: > Hi David, > Unfortunately there's really no way to do what you're hoping to do in an > automatic way. You can move the block files (including their .meta files) > from one disk to another. Do this when the datanode daemon is stopped. > Then, when you start the datanode daemon, it will scan dfs.data.dir and be > totally happy if blocks have moved hard drives. I've never tried to do this > myself, but others on the list have suggested this technique for "balancing > disks." > > You could also change your process around a little. It's not too crazy to > decommission an entire node, replace one of its disks, then bring it back > into the cluster. Seems to me that this is a much saner approach: your ops > team will tell you which disk needs replacing. You decommission the node, > they replace the disk, you add the node back to the pool. Your call I > guess, though. > > Hope this was helpful. > > Alex > > On Thu, Sep 10, 2009 at 6:30 PM, David B. Ritch > <[EMAIL PROTECTED]>wrote: > >> What do you do with the data on a failing disk when you replace it? >> >> Our support person comes in occasionally, and often replaces several >> disks when he does. These are disks that have not yet failed, but >> firmware indicates that failure is imminent. We need to be able to >> migrate our data off these disks before replacing them. If we were >> replacing entire servers, we would decommission them - but we have 3 >> data disks per server. If we were replacing one disk at a time, we >> wouldn't worry about it (because of redundancy). We can decommission >> the servers, but moving all the data off of all their disks is a waste. >> >> What's the best way to handle this? >> >> Thanks! >> >> David >> > -- Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
-
Re: Decommissioning Individual DisksDavid B. Ritch 2009-09-11, 03:06
Thank you both. That's what we did today. It seems fairly reasonable
when a node has a few disks, say 3-5. However, at some sites, with larger nodes, it seems more awkward. When a node has a dozen or more disks (as used in the larger terasort benchmarks), migrating the data off all the disks is likely to be more of an issue. I hope that there is a better solution to this before my client moves to much larger nodes! ;-) dbr On 9/10/2009 10:07 PM, Amandeep Khurana wrote: > I think decommissioning the node and replacing the disk is a cleaner > approach. That's what I'd recommend doing as well.. > > On 9/10/09, Alex Loddengaard <[EMAIL PROTECTED]> wrote: > >> Hi David, >> Unfortunately there's really no way to do what you're hoping to do in an >> automatic way. You can move the block files (including their .meta files) >> from one disk to another. Do this when the datanode daemon is stopped. >> Then, when you start the datanode daemon, it will scan dfs.data.dir and be >> totally happy if blocks have moved hard drives. I've never tried to do this >> myself, but others on the list have suggested this technique for "balancing >> disks." >> >> You could also change your process around a little. It's not too crazy to >> decommission an entire node, replace one of its disks, then bring it back >> into the cluster. Seems to me that this is a much saner approach: your ops >> team will tell you which disk needs replacing. You decommission the node, >> they replace the disk, you add the node back to the pool. Your call I >> guess, though. >> >> Hope this was helpful. >> >> Alex >> >> On Thu, Sep 10, 2009 at 6:30 PM, David B. Ritch >> <[EMAIL PROTECTED]>wrote: >> >> >>> What do you do with the data on a failing disk when you replace it? >>> >>> Our support person comes in occasionally, and often replaces several >>> disks when he does. These are disks that have not yet failed, but >>> firmware indicates that failure is imminent. We need to be able to >>> migrate our data off these disks before replacing them. If we were >>> replacing entire servers, we would decommission them - but we have 3 >>> data disks per server. If we were replacing one disk at a time, we >>> wouldn't worry about it (because of redundancy). We can decommission >>> the servers, but moving all the data off of all their disks is a waste. >>> >>> What's the best way to handle this? >>> >>> Thanks! >>> >>> David >>> >>> >> > >
-
Re: Decommissioning Individual DisksTed Dunning 2009-09-11, 03:11
I would recommend taking the node down without decommissioning, replacing
the disk, then bringing the node back up. After 10-20 minutes the name node will figure things out and start replicating the missing blocks. Rebalancing would be a good idea to fill the new disk. You could even do this with two nodes at a time, but I don't recommend that. As soon as dfs shows no under replicated blocks, you can do the next disk. It could take some time for that to happen. On Thu, Sep 10, 2009 at 8:06 PM, David B. Ritch <[EMAIL PROTECTED]>wrote: > Thank you both. That's what we did today. It seems fairly reasonable > when a node has a few disks, say 3-5. However, at some sites, with > larger nodes, it seems more awkward. When a node has a dozen or more > disks (as used in the larger terasort benchmarks), migrating the data > off all the disks is likely to be more of an issue. I hope that there > is a better solution to this before my client moves to much larger > nodes! ;-) > > dbr > > On 9/10/2009 10:07 PM, Amandeep Khurana wrote: > > I think decommissioning the node and replacing the disk is a cleaner > > approach. That's what I'd recommend doing as well.. > > > > On 9/10/09, Alex Loddengaard <[EMAIL PROTECTED]> wrote: > > > >> Hi David, > >> Unfortunately there's really no way to do what you're hoping to do in an > >> automatic way. You can move the block files (including their .meta > files) > >> from one disk to another. Do this when the datanode daemon is stopped. > >> Then, when you start the datanode daemon, it will scan dfs.data.dir and > be > >> totally happy if blocks have moved hard drives. I've never tried to do > this > >> myself, but others on the list have suggested this technique for > "balancing > >> disks." > >> > >> You could also change your process around a little. It's not too crazy > to > >> decommission an entire node, replace one of its disks, then bring it > back > >> into the cluster. Seems to me that this is a much saner approach: your > ops > >> team will tell you which disk needs replacing. You decommission the > node, > >> they replace the disk, you add the node back to the pool. Your call I > >> guess, though. > >> > >> Hope this was helpful. > >> > >> Alex > >> > >> On Thu, Sep 10, 2009 at 6:30 PM, David B. Ritch > >> <[EMAIL PROTECTED]>wrote: > >> > >> > >>> What do you do with the data on a failing disk when you replace it? > >>> > >>> Our support person comes in occasionally, and often replaces several > >>> disks when he does. These are disks that have not yet failed, but > >>> firmware indicates that failure is imminent. We need to be able to > >>> migrate our data off these disks before replacing them. If we were > >>> replacing entire servers, we would decommission them - but we have 3 > >>> data disks per server. If we were replacing one disk at a time, we > >>> wouldn't worry about it (because of redundancy). We can decommission > >>> the servers, but moving all the data off of all their disks is a waste. > >>> > >>> What's the best way to handle this? > >>> > >>> Thanks! > >>> > >>> David > >>> > >>> > >> > > > > > > -- Ted Dunning, CTO DeepDyve
-
Re: Decommissioning Individual DisksMichael Thomas 2009-09-11, 03:16
What would happen if you did this without taking the node down? For
example, if you have hot-swappable drives in the node(s)? Will the running datanode process pick up the fact that an entire partition goes missing and reappears empty a few minutes later? Or would it be better to at least shut off the datanode process in this scenerio? --Mike Ted Dunning wrote: > I would recommend taking the node down without decommissioning, replacing > the disk, then bringing the node back up. After 10-20 minutes the name node > will figure things out and start replicating the missing blocks. > Rebalancing would be a good idea to fill the new disk. You could even do > this with two nodes at a time, but I don't recommend that. > > As soon as dfs shows no under replicated blocks, you can do the next disk. > It could take some time for that to happen. > > On Thu, Sep 10, 2009 at 8:06 PM, David B. Ritch <[EMAIL PROTECTED]>wrote: > >> Thank you both. That's what we did today. It seems fairly reasonable >> when a node has a few disks, say 3-5. However, at some sites, with >> larger nodes, it seems more awkward. When a node has a dozen or more >> disks (as used in the larger terasort benchmarks), migrating the data >> off all the disks is likely to be more of an issue. I hope that there >> is a better solution to this before my client moves to much larger >> nodes! ;-) >> >> dbr >> >> On 9/10/2009 10:07 PM, Amandeep Khurana wrote: >>> I think decommissioning the node and replacing the disk is a cleaner >>> approach. That's what I'd recommend doing as well.. >>> >>> On 9/10/09, Alex Loddengaard <[EMAIL PROTECTED]> wrote: >>> >>>> Hi David, >>>> Unfortunately there's really no way to do what you're hoping to do in an >>>> automatic way. You can move the block files (including their .meta >> files) >>>> from one disk to another. Do this when the datanode daemon is stopped. >>>> Then, when you start the datanode daemon, it will scan dfs.data.dir and >> be >>>> totally happy if blocks have moved hard drives. I've never tried to do >> this >>>> myself, but others on the list have suggested this technique for >> "balancing >>>> disks." >>>> >>>> You could also change your process around a little. It's not too crazy >> to >>>> decommission an entire node, replace one of its disks, then bring it >> back >>>> into the cluster. Seems to me that this is a much saner approach: your >> ops >>>> team will tell you which disk needs replacing. You decommission the >> node, >>>> they replace the disk, you add the node back to the pool. Your call I >>>> guess, though. >>>> >>>> Hope this was helpful. >>>> >>>> Alex >>>> >>>> On Thu, Sep 10, 2009 at 6:30 PM, David B. Ritch >>>> <[EMAIL PROTECTED]>wrote: >>>> >>>> >>>>> What do you do with the data on a failing disk when you replace it? >>>>> >>>>> Our support person comes in occasionally, and often replaces several >>>>> disks when he does. These are disks that have not yet failed, but >>>>> firmware indicates that failure is imminent. We need to be able to >>>>> migrate our data off these disks before replacing them. If we were >>>>> replacing entire servers, we would decommission them - but we have 3 >>>>> data disks per server. If we were replacing one disk at a time, we >>>>> wouldn't worry about it (because of redundancy). We can decommission >>>>> the servers, but moving all the data off of all their disks is a waste. >>>>> >>>>> What's the best way to handle this? >>>>> >>>>> Thanks! >>>>> >>>>> David >>>>> >>>>> >>> >> > >
-
Re: Decommissioning Individual DisksTed Dunning 2009-09-11, 03:37
I think that would be a bit too rude. The datanode scans the data partition
when it comes up. Better to give it the benefit of a bounce so that it can inform the name node of the new state of affairs. On Thu, Sep 10, 2009 at 8:16 PM, Michael Thomas <[EMAIL PROTECTED]>wrote: > Will the > running datanode process pick up the fact that an entire partition goes > missing and reappears empty a few minutes later? > > Or would it be better to at least shut off the datanode process in this > scenerio? > -- Ted Dunning, CTO DeepDyve
-
Re: Decommissioning Individual DisksAllen Wittenauer 2009-09-11, 16:23
On 9/10/09 8:06 PM, "David B. Ritch" <[EMAIL PROTECTED]> wrote:
> Thank you both. That's what we did today. It seems fairly reasonable > when a node has a few disks, say 3-5. However, at some sites, with > larger nodes, it seems more awkward. Hmm. The vast majority of sites are using 4 disk configurations, that I know of. I'd love to know who using 5 or more drives and have a conversation with them. [The only people who did terasort on 12 disks that I know of is Google... and they weren't using Hadoop. :)]
-
Re: Decommissioning Individual DisksEdward Capriolo 2009-09-11, 17:56
On Fri, Sep 11, 2009 at 12:23 PM, Allen Wittenauer
<[EMAIL PROTECTED]> wrote: > On 9/10/09 8:06 PM, "David B. Ritch" <[EMAIL PROTECTED]> wrote: >> Thank you both. That's what we did today. It seems fairly reasonable >> when a node has a few disks, say 3-5. However, at some sites, with >> larger nodes, it seems more awkward. > > Hmm. The vast majority of sites are using 4 disk configurations, that I > know of. I'd love to know who using 5 or more drives and have a > conversation with them. > > [The only people who did terasort on 12 disks that I know of is Google... > and they weren't using Hadoop. :)] > > >>> Will the running datanode process pick up the fact that an entire partition goes missing and reappears empty a few minutes later? >>> If you lose a directory the datanode stops. See https://issues.apache.org/jira/browse/HDFS-457 FYI we are using 8 1TB SATA disks on Data Nodes. When I lose a disk Hadoop "self heals". You can control the network bandwidth used for replication and you have the balancer app. With large disks you really dont have time to copy data since the datanode is going to be marked down in 10 minutes and all the data will begin getting copied elsewhere.
-
Re: Decommissioning Individual DisksBoris Shkolnik 2009-09-14, 17:20
The JIRA is committed already. So loosing a directory will not stop the datanode anymore. If directory shows up later again it will not be added automatically. On 9/11/09 10:56 AM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote: > If you lose a directory the datanode stops. See > https://issues.apache.org/jira/browse/HDFS-457 |