|
|
uday chopra 2012-11-06, 00:19
What do folks do to backup hdfs data? Has anyone experience in trying to use enterprise solutions such as netbackup with datadomain D-2-D appliance for doing backups of data in hdfs? If so, what is the average dedup ratio? (I understand mileage can vary based on the type of data)
Thanks, Uday
-
Re: backup of hdfs data
Ted Dunning 2012-11-06, 00:47
Conventional enterprise backup systems are rarely scaled for hadoop needs. Both bandwidth and size are typically lacking.
My employer, Mapr, offers a hadoop-derived distribution that includes both point in time snapshots and remote mirrors. Contact me off line for more info.
Sent from my iPhone
On Nov 5, 2012, at 4:19 PM, uday chopra <[EMAIL PROTECTED]> wrote:
> What do folks do to backup hdfs data? > Has anyone experience in trying to use enterprise solutions such as netbackup with datadomain D-2-D appliance for doing backups of data in hdfs? If so, what is the average dedup ratio? (I understand mileage can vary based on the type of data) > > Thanks, > Uday
-
Re: backup of hdfs data
Jay Vyas 2012-11-06, 01:21
Amazon has a really cheap, large scale backup solution called glacier which is good if your just backing up for the sake of archival in emergencies. If you need the archival to be performant, than you might want to just consider a higher replication rate.
-
Re: backup of hdfs data
Michael Segel 2012-11-06, 04:44
You have other options.
You could create a secondary cluster. You could also look in to Cleversafe and what they are doing with Hadoop.
Here's the sad thing about backing up to tape... you can dump a couple of 10's of TB to tape. You lose your system. How long will it take to recover? And that's the thing, You need to think about this not just in DR but in terms of BCP (Business Continuity Processing)
On Nov 5, 2012, at 7:21 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
> Amazon has a really cheap, large scale backup solution called glacier which is good if your just backing up for the sake of archival in emergencies. If you need the archival to be performant, than you might want to just consider a higher replication rate.
-
Re: backup of hdfs data
Bharath Mundlapudi 2012-11-06, 05:10
If data is less in your cluster (say less than few GBs) then answer is yes. But it is an expensive route. For large data sets, traditional means is not feasible and it is expensive. If you want optimal cost based solution, you could setup another local/remote cluster and try discp or simply copy hdfs files to JBODs. Disk is cheap :).
-Bharath
________________________________ From: uday chopra <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, November 5, 2012 4:19 PM Subject: backup of hdfs data
What do folks do to backup hdfs data? Has anyone experience in trying to use enterprise solutions such as netbackup with datadomain D-2-D appliance for doing backups of data in hdfs? If so, what is the average dedup ratio? (I understand mileage can vary based on the type of data)
Thanks, Uday
-
Re: backup of hdfs data
Serge Blazhiyevskyy 2012-11-06, 07:40
I second this proposed solution. Distcp work very well with backing up data on the separate cluster
From: Bharath Mundlapudi <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, Bharath Mundlapudi <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Date: Tuesday, November 6, 2012 7:10 AM To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> Subject: Re: backup of hdfs data
If data is less in your cluster (say less than few GBs) then answer is yes. But it is an expensive route. For large data sets, traditional means is not feasible and it is expensive. If you want optimal cost based solution, you could setup another local/remote cluster and try discp or simply copy hdfs files to JBODs. Disk is cheap :).
-Bharath ________________________________ From: uday chopra <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Sent: Monday, November 5, 2012 4:19 PM Subject: backup of hdfs data
What do folks do to backup hdfs data? Has anyone experience in trying to use enterprise solutions such as netbackup with datadomain D-2-D appliance for doing backups of data in hdfs? If so, what is the average dedup ratio? (I understand mileage can vary based on the type of data)
Thanks, Uday
-
Re: backup of hdfs data
uday chopra 2012-11-06, 16:56
Thanks for all the responses. This is very useful information. On Mon, Nov 5, 2012 at 11:40 PM, Serge Blazhiyevskyy < [EMAIL PROTECTED]> wrote:
> I second this proposed solution. Distcp work very well with backing up > data on the separate cluster > > From: Bharath Mundlapudi <[EMAIL PROTECTED]<mailto: > [EMAIL PROTECTED]>> > Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" < > [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, Bharath > Mundlapudi <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> > Date: Tuesday, November 6, 2012 7:10 AM > To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" < > [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> > Subject: Re: backup of hdfs data > > If data is less in your cluster (say less than few GBs) then answer is > yes. But it is an expensive route. For large data sets, traditional means > is not feasible and it is expensive. > If you want optimal cost based solution, you could setup another > local/remote cluster and try discp or simply copy hdfs files to JBODs. Disk > is cheap :). > > -Bharath > > > ________________________________ > From: uday chopra <[EMAIL PROTECTED]<mailto: > [EMAIL PROTECTED]>> > To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> > Sent: Monday, November 5, 2012 4:19 PM > Subject: backup of hdfs data > > What do folks do to backup hdfs data? > Has anyone experience in trying to use enterprise solutions such as > netbackup with datadomain D-2-D appliance for doing backups of data in > hdfs? If so, what is the average dedup ratio? (I understand mileage can > vary based on the type of data) > > Thanks, > Uday > > >
|
|