Yes you will have redundancy, so no single point of hardware failure can wipe out your data, short of a major catastrophe. But you can still have an errant or malicious "hadoop fs -rm -rf" shut you down. If you still have the original source of your data somewhere else you may be able to recover, by reprocessing the data, but if this cluster is your single repository for all your data you may have a problem.
On 5/29/12 11:40 AM, "Michael Segel" <[EMAIL PROTECTED]> wrote:
That's not a back up strategy.
You could still have joe luser take out a key file or directory. What do you do then?
On May 29, 2012, at 11:19 AM, Darrell Taylor wrote:
> We are about to build a 10 machine cluster with 40Tb of storage, obviously
> as this gets full actually trying to create an offsite backup becomes a
> problem unless we build another 10 machine cluster (too expensive right
> now). Not sure if it will help but we have planned the cabinet into an
> upper and lower half with separate redundant power, then we plan to put
> half of the cluster in the top, half in the bottom, effectively 2 racks, so
> in theory we could lose half the cluster and still have the copies of all
> the blocks with a replication factor of 3? Apart form the data centre
> burning down or some other disaster that would render the machines totally
> unrecoverable, is this approach good enough?
> I realise this is a very open question and everyone's circumstances are
> different, but I'm wondering what other peoples experiences/opinions are
> for backing up cluster data?