Paul Mackles 2012-02-20, 21:20
Stack 2012-02-20, 21:28
Paul Mackles 2012-02-20, 21:58
Stack 2012-02-21, 05:19
I was thinking about this and have a couple thoughts...
While Stack's solution above would work, it means a couple things: 1. if
you haven't saved splits, your going to have to figure out how to pre-split
for a full restore. 2. you have to wait for the data re-sort at recovery
time instead of backup time so recovery time will be substantially longer.
It seems like we should make a new script like export that automatically
exports the data as bulk importable along with all of the table's schema
and split information. We then could make an import script that simply
creates the backed up table (to potentially a different target name) and
then bulk imports it, pre-splitting using the splits defined on export.
(We actually did something like this recently to migrate data from one
format to another.)
It wouldn't work for the case where you are trying to do a merged restore
(e.g. pre-existing table) but it seems like recovery would be really quick.
I suppose you could allow it to support importing into an existing table
but then you may have to wait for splits on a bunch of the files (I know
the bulk import script is designed to do this but i'm not sure how it would
handle a large amount of splits if your target table has diverged
substantially from when the backup was done).
On Mon, Feb 20, 2012 at 9:19 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Mon, Feb 20, 2012 at 1:58 PM, Paul Mackles <[EMAIL PROTECTED]> wrote:
> > Actually, an hbase export to "bulk load" facility sounds like a great
> idea. We have been using bulk loads to migrate data from an older data
> store and they have worked awesome for us. It also doesn't seem like it
> would be that hard to implement. So what am I missing?
> Check out the Import.java in mapreduce package. See how its pulling
> from SequenceFiles into a map that outputs to a TableOutputFormat
> inside in the map. Make a new MR job that has same input but that
> outputs to HFileOutputFormat instead (you'll need the total order
> partitioner and a reducer in the mix which Import doesn't have).
lars hofhansl 2012-02-24, 07:27
lars hofhansl 2012-02-21, 17:27
lars hofhansl 2012-02-22, 01:55
lars hofhansl 2012-02-24, 02:12