Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> A list of HBase backup options


Copy link to this message
-
A list of HBase backup options
Hi,

I've got some data in HBase that I'd hate to lose.  Yeah, very original. :))
I know I can:
1) make a export/backup of 1 table at a time using
org.apache.hadoop.hbase.mapreduce.Export from HBASE-1684
2) copy 1 table at a time using
http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/CopyTable.html

3) use distcp to copy the whole /hbase part of HDFS
4) replicate the whole cluster - http://hbase.apache.org/replication.html
5) count on HDFS replication and live without the standard backup
What I'm not sure about is the following:

1) Is any one of the above options "hot", meaning that it can be used while the
source cluster is running and that it produces a consistent backup (a snapshot
or checkpoint of the source cluster's data)?
I imagine only replication of the whole cluster (point 4) above) is really
"hot"?

2) If the HBase cluster lives in EC2, what's the best thing to do with the
backup/snapshot?  EBS may be too expensive.  Are people stuffing their HBase
backups into S3 somehow, despite the S3 per-bucket limit of 5 GB?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/