Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Backing up HDFS?


+
Nathan Marz 2009-02-10, 00:17
+
Amandeep Khurana 2009-02-10, 00:41
+
Nathan Marz 2009-02-10, 01:06
+
Brian Bockelman 2009-02-10, 01:08
+
Allen Wittenauer 2009-02-10, 01:22
Copy link to this message
-
Re: Backing up HDFS?
Hey,

There's also a ticket open to enable global snapshots for a single HDFS
instance: https://issues.apache.org/jira/browse/HADOOP-3637. While this
doesn't solve the multi-site backup issue, it does provide stronger
protection against programmatic deletion of data in a single cluster.

Regards,
Jeff

On Mon, Feb 9, 2009 at 5:22 PM, Allen Wittenauer <[EMAIL PROTECTED]> wrote:

> On 2/9/09 4:41 PM, "Amandeep Khurana" <[EMAIL PROTECTED]> wrote:
> > Why would you want to have another backup beyond HDFS? HDFS itself
> > replicates your data so if the reliability of the system shouldnt be a
> > concern (if at all it is)...
>
> I'm reminded of a previous job where a site administrator refused to make
> tape backups (despite our continual harassment and pointing out that he was
> in violation of the contract) because he said RAID was "good enough".
>
> Then the RAID controller failed. When we couldn't recover data "from the
> other mirror" he was fired.  Not sure how they ever recovered, esp.
> considering what the data was they lost.  Hopefully they had a paper trail.
>
> To answer Nathan's question:
>
> > On Mon, Feb 9, 2009 at 4:17 PM, Nathan Marz <[EMAIL PROTECTED]> wrote:
> >
> >> How do people back up their data that they keep on HDFS? We have many TB
> of
> >> data which we need to get backed up but are unclear on how to do this
> >> efficiently/reliably.
>
> The content of our HDFSes is loaded from elsewhere and is not considered
> 'the source of authority'.  It is the responsibility of the original
> sources
> to maintain backups and we then follow their policies for data retention.
> For user generated content, we provide *limited* (read: quota'ed) NFS space
> that is backed up regularly.
>
> Another strategy we take is multiple grids in multiple locations that get
> the data loaded simultaneously.
>
> The key here is to prioritize your data.  Impossible to replicate data gets
> backed up using whatever means necessary, hard-to-regenerate data, next
> priority. Easy to regenerate and ok to nuke data, doesn't get backed up.
>
>
+
Steve Loughran 2009-02-11, 10:44
+
Stefan Podkowinski 2009-02-12, 09:41
+
lohit 2009-02-10, 02:05
+
dan.paulus 2010-08-03, 13:54
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB