Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> A list of HBase backup options


Copy link to this message
-
A list of HBase backup options
Hi,

I've got some data in HBase that I'd hate to lose.  Yeah, very original. :))
I know I can:
1) make a export/backup of 1 table at a time using
org.apache.hadoop.hbase.mapreduce.Export from HBASE-1684
2) copy 1 table at a time using
http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/CopyTable.html

3) use distcp to copy the whole /hbase part of HDFS
4) replicate the whole cluster - http://hbase.apache.org/replication.html
5) count on HDFS replication and live without the standard backup
What I'm not sure about is the following:

1) Is any one of the above options "hot", meaning that it can be used while the
source cluster is running and that it produces a consistent backup (a snapshot
or checkpoint of the source cluster's data)?
I imagine only replication of the whole cluster (point 4) above) is really
"hot"?

2) If the HBase cluster lives in EC2, what's the best thing to do with the
backup/snapshot?  EBS may be too expensive.  Are people stuffing their HBase
backups into S3 somehow, despite the S3 per-bucket limit of 5 GB?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB