Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Hadoop HDFS Backup/Restore Solutions


+
Mac Noland 2012-01-03, 20:53
+
alo alt 2012-01-03, 21:10
Copy link to this message
-
Re: Hadoop HDFS Backup/Restore Solutions
MapR provides this out of the box in a completely Hadoop compatible
environment.

Doing this with straight Hadoop involves a fair bit of baling wire.

On Tue, Jan 3, 2012 at 1:10 PM, alo alt <[EMAIL PROTECTED]> wrote:

> Hi Mac,
>
> hdfs has at the moment no solution for an complete backup- and restore
> process like ITL or ISO9000. An strategy could be to "park" the data from
> hdfs do you want to backup on tape with "distcp" to another backup cluster
> and snapshot from them with SAN mechanism. Here the DN store has to be
> located on the SAN box.
>
> - Alex
>
> On Tuesday, January 3, 2012, Mac Noland <[EMAIL PROTECTED]> wrote:
> > Good day,
> >
> > I’m guessing this question been asked a myriad of times, but
> > we’re about to get serious with some of our Hadoop implementations so I
> wanted
> > to re-ask to see if I’m missing anything, or if others happen to know if
> this might
> > be on a future road map.
> >
> > For our current storage offerings (e.g. NAS or SAN), we give
> > businesses the opportunity to choose 7, 14, or 45 day “backups” for their
> > storage.   The purpose of the backup isn’t
> > so much as they are worried about losing their current data (we’re
> RAID’ed
> > and  have some stuff mirrored to remote
> > datacenters), but more so if they were to delete some data today, they
> can
> > recover from yesterday’s backup.  Or the
> > day before’s backup, or the day before that, etc.  And to be honest,
> business units buy a good portion of their backups to make people feel
> better and fulfill custom contracts.
> >
> >
> > So far with HDFS we haven’t found too many formalized
> > offerings for this specific feature.  While I haven’t done a ton of
> research, the best solution I’ve found is an
> > idea where we’d schedule a job to pull the data locally to a mount that
> is
> > backed up via our traditional methods.  See Michael Segel’s first post
> on this site
> http://lucene.472066.n3.nabble.com/Backing-up-HDFS-td1019184.html
> >
> > Though we’d have to work through the details of what this
> > would look like for our support folks, it looks like something that could
> > potentially fit into our current model.  We’d basically need to allocate
> the same amount of SAN or NAS disk as we
> > have for HDFS, then coordinate a snap on the the SAN or NAS via our
> traditional
> > methods.  Not sure what a restore would
> > look like, other than we could give the end users read access to the NAS
> or SAN
> > mounts so they can pick through what they need to recover and let them
> figure
> > out how to get it back into HDFS.
> >
> > For use cases like ours where we’d need multi-day backups to
> > fulfill business needs, is this kind of what people are thinking or
> doing?  Moreover, are there any things in the Hadoop
> > HDFS road map for providing, for lack of a better word, an “enterprise”
> > backup/restore solution?
> >
> > Thanks in advance,
> >
> > Mac Noland – Thomson Reuters
> >
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> *P **Think of the environment: please don't print this email unless you
> really need to.*
>
>
>
+
Arun C Murthy 2012-01-03, 22:15
+
Ossi 2012-01-05, 14:34
+
Mac Noland 2012-01-03, 21:31
+
Joe Stein 2012-01-03, 21:34
+
Alexander Lorenz 2012-01-03, 21:42
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB