Amit Sela 2012-11-01, 17:03
Yes while doing the bulk load the table can be presplit. It will have the same number of reducers as that of the region. One per region. Each HFile that the reducer generates will be having a max size of HFile max size configuration.
You can see that while bulk loading also there will be splits on the HFiles if needed (as per the new splits which may happen on the regions)
Yes in case of table being not splits, later it will lead to splits...
Better way would be to do presplit I would say.
From: Amit Sela [[EMAIL PROTECTED]]
Sent: Thursday, November 01, 2012 10:33 PM
To: [EMAIL PROTECTED]
Subject: Bulk Loading - LoadIncrementalHFiles
I'm using MR to bulk load into HBase by
using HFileOutputFormat.configureIncrementalLoad and after the job is
complete I use loadIncrementalHFiles.doBulkLoad
>From what I see, the MR outputs a file for each CF written and to my
understanding these files are loaded as store files into a region.
What I don't understand is *how many regions will open* ? and *how is that
If I have 3 CF's and a lot of data to load, does that mean 3 large store
files will load into 1 region (more ?) and this region will split on major
Can I pre-create regions and tell the bulk load to split the data between
them during the load ?
In general, if someone could elaborate about LoadIncrementalHFiles it would
save me a lot of time diving into it.
Another question I is about running over values, is it possible to load an
updated value ? or generally updating columns and values for an existing
I'd think that there's no problem but when I try to run the same bulk load
twice (MR and then load) with the same data, the second time fails.
Right after mapreduce.LoadIncrementalHFiles: Trying to load hfile=........
I get: ERROR mapreduce.LoadIncrementalHFiles: Unexpected execution
exception during splitting...