-Re: How to remove three disks from three different nodes in a ten node cluster in less than an hour without losing replicas?
Colin McCabe 2013-02-04, 22:53
It sounds like what you would like is a way to decommission just one
storage directory on the DataNode. We don't currently support that.
You might be able to get something approaching this result with
"chmod 000 $storage_directory_root". That would at least prevent new
blocks from being created on the disk which you don't trust any more. It
would also cause the existing blocks to be re-replicated when the
DirectoryScanner re-ran and noticed it couldn't get to them. Note that I
haven't actually tested the chmod solution, though, so your milage may vary.
On Wed, Jan 30, 2013 at 10:34 PM, Stack <[EMAIL PROTECTED]> wrote:
> Here is a little puzzle.
> An admin works for a cash-strapped, popular web shop. At the datacenter
> she has a ten node cluster that is heavily used. It runs hot all day long
> and decommissioning a node with its background replicating of 12 disks
> worth of data messes up the work load she has on top of it and makes her
> clients very unhappy. Replicating the data of one node takes at least an
> hour. This cluster has three bad disks in three different nodes
> (replication factor is 3). The admin lives an hour from the datacenter.
> She can't afford a cage monkey and so must replace the disks herself.
> If she left home at 2pm and had to be back by 6pm before the kids came
> home from school, how would she replace the three disks without for sure
> losing a replica?
> Is the only answer remove one, wait on clean fsck run, remove the next one?