Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> A list of HBase backup options

Copy link to this message
A list of HBase backup options

I've got some data in HBase that I'd hate to lose.  Yeah, very original. :))
I know I can:
1) make a export/backup of 1 table at a time using
org.apache.hadoop.hbase.mapreduce.Export from HBASE-1684
2) copy 1 table at a time using

3) use distcp to copy the whole /hbase part of HDFS
4) replicate the whole cluster - http://hbase.apache.org/replication.html
5) count on HDFS replication and live without the standard backup
What I'm not sure about is the following:

1) Is any one of the above options "hot", meaning that it can be used while the
source cluster is running and that it produces a consistent backup (a snapshot
or checkpoint of the source cluster's data)?
I imagine only replication of the whole cluster (point 4) above) is really

2) If the HBase cluster lives in EC2, what's the best thing to do with the
backup/snapshot?  EBS may be too expensive.  Are people stuffing their HBase
backups into S3 somehow, despite the S3 per-bucket limit of 5 GB?

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/