|
|
-
How to Backup HDFS data ?
Steve Edison 2013-01-24, 23:29
Folks,
Its been an year and my HDFS / Solar /Hive setup is working flawless. The data logs which were meaningless to my business all of a sudden became precious to the extent that our management wants to backup this data. I am talking about 20 TB of active HDFS data with an incremental of 2 TB/month. We would like to have weekly and monthly backups upto 12 months.
Any ideas how to do this ?
-- Steve
+
Steve Edison 2013-01-24, 23:29
-
Re: How to Backup HDFS data ?
Mathias Herberts 2013-01-24, 23:32
Backup on tape or on disk?
On disk, have another Hadoop cluster dans do regular distcp.
On tape, make sure you have a backup program which can backup streams so you don't have to materialize your TB files outside of your Hadoop cluster first... (I know Simpana can't do that :-().
On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <[EMAIL PROTECTED]> wrote: > Folks, > > Its been an year and my HDFS / Solar /Hive setup is working flawless. The > data logs which were meaningless to my business all of a sudden became > precious to the extent that our management wants to backup this data. I am > talking about 20 TB of active HDFS data with an incremental of 2 TB/month. > We would like to have weekly and monthly backups upto 12 months. > > Any ideas how to do this ? > > -- Steve
+
Mathias Herberts 2013-01-24, 23:32
-
Re: How to Backup HDFS data ?
Steve Edison 2013-01-24, 23:34
Backup to disks is what we do right now. Distcp would copy across HDFS clusters, meaning by I will have to build another 12 node cluster ? Is that correct ? On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts < [EMAIL PROTECTED]> wrote:
> Backup on tape or on disk? > > On disk, have another Hadoop cluster dans do regular distcp. > > On tape, make sure you have a backup program which can backup streams > so you don't have to materialize your TB files outside of your Hadoop > cluster first... (I know Simpana can't do that :-(). > > On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <[EMAIL PROTECTED]> > wrote: > > Folks, > > > > Its been an year and my HDFS / Solar /Hive setup is working flawless. The > > data logs which were meaningless to my business all of a sudden became > > precious to the extent that our management wants to backup this data. I > am > > talking about 20 TB of active HDFS data with an incremental of 2 > TB/month. > > We would like to have weekly and monthly backups upto 12 months. > > > > Any ideas how to do this ? > > > > -- Steve >
+
Steve Edison 2013-01-24, 23:34
-
Re: How to Backup HDFS data ?
Harsh J 2013-01-25, 06:23
You need some form of space capacity on the backup cluster that can withstand it. Lower replication (<3) may also be an option there to save yourself some disks/nodes?
On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison <[EMAIL PROTECTED]> wrote: > Backup to disks is what we do right now. Distcp would copy across HDFS > clusters, meaning by I will have to build another 12 node cluster ? Is that > correct ? > > > On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts > <[EMAIL PROTECTED]> wrote: >> >> Backup on tape or on disk? >> >> On disk, have another Hadoop cluster dans do regular distcp. >> >> On tape, make sure you have a backup program which can backup streams >> so you don't have to materialize your TB files outside of your Hadoop >> cluster first... (I know Simpana can't do that :-(). >> >> On Fri, Jan 25, 2013 at 12:29 AM, Steve Edison <[EMAIL PROTECTED]> >> wrote: >> > Folks, >> > >> > Its been an year and my HDFS / Solar /Hive setup is working flawless. >> > The >> > data logs which were meaningless to my business all of a sudden became >> > precious to the extent that our management wants to backup this data. I >> > am >> > talking about 20 TB of active HDFS data with an incremental of 2 >> > TB/month. >> > We would like to have weekly and monthly backups upto 12 months. >> > >> > Any ideas how to do this ? >> > >> > -- Steve > >
-- Harsh J
+
Harsh J 2013-01-25, 06:23
|
|