-Bulk Loading - LoadIncrementalHFiles
Amit Sela 2012-11-01, 17:03
I'm using MR to bulk load into HBase by
using HFileOutputFormat.configureIncrementalLoad and after the job is
complete I use loadIncrementalHFiles.doBulkLoad
>From what I see, the MR outputs a file for each CF written and to my
understanding these files are loaded as store files into a region.
What I don't understand is *how many regions will open* ? and *how is that
If I have 3 CF's and a lot of data to load, does that mean 3 large store
files will load into 1 region (more ?) and this region will split on major
Can I pre-create regions and tell the bulk load to split the data between
them during the load ?
In general, if someone could elaborate about LoadIncrementalHFiles it would
save me a lot of time diving into it.
Another question I is about running over values, is it possible to load an
updated value ? or generally updating columns and values for an existing
I'd think that there's no problem but when I try to run the same bulk load
twice (MR and then load) with the same data, the second time fails.
Right after mapreduce.LoadIncrementalHFiles: Trying to load hfile=........
I get: ERROR mapreduce.LoadIncrementalHFiles: Unexpected execution
exception during splitting...