Matthew, the short answer is hire a consultant to work with you on your DR/BCP strategy. :-)
Short of that... you have a couple of things...
Your back-up cluster, is it in the same site? (What happens when site goes down?)
Are you planning to make your back up cluster and main cluster homogenous? By this I mean if your main cluster has 1PB of disk w 4x2TB or 4x3TB drives, will your backup cluster have the same configuration?
(You may want to consider asymmetry in designing your clusters) So your backup cluster has fewer nodes but more drives per node.
You also have to look at your data. Are your data sets small and discrete? If so, you could probably back them up to tape, (snapshots) , just in case of human error and you didn't catch it in time and the error gets propagated to your backup cluster.
I haven't played with fuse, so I don't know if there are any performance issues, but on a back up cluster, I don't think its much of an issue.
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Hadoop multi tier backup
> Date: Tue, 30 Aug 2011 16:54:07 +0000
> We were discussing how we would backup our data from the various environments we will have and I was hoping someone could chime in with previous experience in this. My primary concern about our cluster is that we would like to be able to recover anything within the last 60 days so having full backups both on tape and through distcp is preferred.
> Out initial thoughts can be seen in the jpeg attached but just in case any of you are weary of attachments it can also be summarized below:
> Prod Cluster --DistCp--> On-site Backup cluster with Fuse mount point running NetBackup daemon --NetBackup--> Media Server --> Tape
> One of our biggest grey areas so far is how do most people accomplish incremental backups? Our thought was to tie this into our NetBackup configuration as this can be done for other connectors but we do not see anything for HDFS yet.
> This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.
> All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
> The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all
> applicable U.S. export laws and regulations.