-How to remove three disks from three different nodes in a ten node cluster in less than an hour without losing replicas?
Here is a little puzzle.
An admin works for a cash-strapped, popular web shop. At the datacenter
she has a ten node cluster that is heavily used. It runs hot all day long
and decommissioning a node with its background replicating of 12 disks
worth of data messes up the work load she has on top of it and makes her
clients very unhappy. Replicating the data of one node takes at least an
hour. This cluster has three bad disks in three different nodes
(replication factor is 3). The admin lives an hour from the datacenter.
She can't afford a cage monkey and so must replace the disks herself.
If she left home at 2pm and had to be back by 6pm before the kids came home
from school, how would she replace the three disks without for sure losing
Is the only answer remove one, wait on clean fsck run, remove the next one?
Colin McCabe 2013-02-04, 22:53