You need some form of space capacity on the backup cluster that can
withstand it. Lower replication (<3) may also be an option there to
save yourself some disks/nodes?
On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <[EMAIL PROTECTED]> wrote:
> Backup to disks is what we do right now. Distcp would copy across HDFS
> clusters, meaning by I will have to build another 12 node cluster ? Is that
> correct ?
> On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts
> <[EMAIL PROTECTED]> wrote:
>> Backup on tape or on disk?
>> On disk, have another Hadoop cluster dans do regular distcp.
>> On tape, make sure you have a backup program which can backup streams
>> so you don't have to materialize your TB files outside of your Hadoop
>> cluster first... (I know Simpana can't do that :-().
>> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <[EMAIL PROTECTED]>
>> > Folks,
>> > Its been an year and my HDFS / Solar /Hive setup is working flawless.
>> > The
>> > data logs which were meaningless to my business all of a sudden became
>> > precious to the extent that our management wants to backup this data. I
>> > am
>> > talking about 20 TB of active HDFS data with an incremental of 2
>> > TB/month.
>> > We would like to have weekly and monthly backups upto 12 months.
>> > Any ideas how to do this ?
>> > -- Steve