Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> backup of hdfs data


Copy link to this message
-
Re: backup of hdfs data
I second this proposed solution. Distcp work very well with backing up data on the separate cluster

From: Bharath Mundlapudi <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, Bharath Mundlapudi <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Tuesday, November 6, 2012 7:10 AM
To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Re: backup of hdfs data

If data is less in your cluster (say less than few GBs) then answer is yes. But it is an expensive route. For large data sets, traditional means is not feasible and it is expensive.
If you want optimal cost based solution, you could setup another local/remote cluster and try discp or simply copy hdfs files to JBODs. Disk is cheap :).

-Bharath
________________________________
From: uday chopra <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Sent: Monday, November 5, 2012 4:19 PM
Subject: backup of hdfs data

What do folks do to backup hdfs data?
Has anyone experience in trying to use enterprise solutions such as netbackup with datadomain D-2-D appliance for doing backups of data in hdfs? If so, what is the average dedup ratio? (I understand mileage can vary based on the type of data)

Thanks,
Uday