Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> export/import for backup


Copy link to this message
-
Re: export/import for backup
I was thinking about this and have a couple thoughts...

While Stack's solution above would work, it means a couple things: 1. if
you haven't saved splits, your going to have to figure out how to pre-split
for a full restore.  2. you have to wait for the data re-sort at recovery
time instead of backup time so recovery time will be substantially longer.

It seems like we should make a new script like export that automatically
exports the data as bulk importable along with all of the table's schema
and split information.  We then could make an import script that simply
creates the backed up table (to potentially a different target name) and
then bulk imports it, pre-splitting using the splits defined on export.
 (We actually did something like this recently to migrate data from one
format to another.)

It wouldn't work for the case where you are trying to do a merged restore
(e.g. pre-existing table) but it seems like recovery would be really quick.
 I suppose you could allow it to support importing into an existing table
but then you may have to wait for splits on a bunch of the files (I know
the bulk import script is designed to do this but i'm not sure how it would
handle a large amount of splits if your target table has diverged
substantially from when the backup was done).

Jacques
On Mon, Feb 20, 2012 at 9:19 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Mon, Feb 20, 2012 at 1:58 PM, Paul Mackles <[EMAIL PROTECTED]> wrote:
> > Actually, an hbase export to "bulk load" facility sounds like a great
> idea. We have been using bulk loads to migrate data from an older data
> store and they have worked awesome for us. It also doesn't seem like it
> would be that hard to implement. So what am I missing?
> >
>
> Little?
>
> Check out the Import.java in mapreduce package.  See how its pulling
> from SequenceFiles into a map that outputs to a TableOutputFormat
> inside in the map.  Make a new MR job that has same input but that
> outputs to HFileOutputFormat instead (you'll need the total order
> partitioner and a reducer in the mix which Import doesn't have).
>
> St.Ack
>