Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> HBase backup option

Copy link to this message
RE: HBase backup option
The backup tool takes snapshots of HFiles on a per-region basis. Before
copying anything, we flush the region and then list all its files at
that time. If we can successfully copy a region, we assume that all its
files are consistent for that region because they are immutable. If we
can't successfully copy an entire region, then it is failed and later
retried. Knowing this, the snapshots of the HFiles may be of different
times for different regions. So with the backup tool alone, we can't
guarantee consistent table snapshots. This is why we also use WALPlayer,
which takes care of replaying logs until the time we wish to restore to.

Odds of something not recoverable? That's a very good question. So far
we haven't had a non-recoverable backup with our most current version.
But honestly, I don't know yet. We've only recently completed this tool
and are still testing it. With my limited knowledge of HBase, it's
likely that I missed something, and it's one of the reasons we are
releasing it, so anyone interested can test it out and verify or break
its logic, and suggest improvements. In fact, I also included the
algorithm in our github wiki page for this reason. Feel free to review
it for accuracy. In the mean time, I'll work on a better answer for this
:) Hopefully, we can find out soon. Thanks

-----Original Message-----
From: Vladimir Rodionov [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, May 08, 2012 7:38 PM
Subject: RE: HBase backup option


How did you achieve consistency? Are table snapshots consistent? If not,
what are the odds to get something not recoverable?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com

From: Espinoza,Carlos [[EMAIL PROTECTED]]
Sent: Tuesday, May 08, 2012 1:46 PM
Subject: HBase backup option


I was asked to mention this on the mailing list. At OCLC, we are working
towards moving our data to HBase. A huge requirement is to backup our
data, obviously, and seeing that this is still a work in progress, we
decided to write something for ourselves. Using all the resources that
we found available on Jira, the HBase book, etc., we came up with a few
tools that we are currently testing and using. We were able to upload it
to github, so here it is


So far, they have been great for us. And if anyone is interested in
giving it a try, please go ahead, any feedback would be greatly
appreciated. Thanks!

Confidentiality Notice:  The information contained in this message,
including any attachments hereto, may be confidential and is intended to
be read only by the individual or entity to whom this message is
addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any
review, use, disclosure or distribution of this message or its
attachments, in any form, is strictly prohibited.  If you have received
this message in error, please immediately notify the sender and/or
[EMAIL PROTECTED] and delete or destroy any copy of this
message and its attachments.