I have about 1 billion values I am trying to load into a new HBase table
(with just one column and column family), but am running into some issues.
Currently I am trying to use MapReduce to import these by first converting
them to HFiles and then using LoadIncrementalHFiles.doBulkLoad(). I also
use HFileOutputFormat.configureIncrementalLoad() as part of my MR job. My
code is essentially the same as this example:
The problem I'm running into is that only 1 reducer is created
by configureIncrementalLoad(), and there is not enough space on this node
to handle all this data. configureIncrementalLoad() should start one
reducer for every region the table has, so apparently the table only has 1
region -- maybe because it is empty and brand new (my understanding of how
regions work is not crystal clear)? The cluster has 5 region servers, so
I'd at least like that many reducers to handle this loading.
On a side note, I also tried the command line tool, completebulkload, but
am running into other issues with this (timeouts, possible heap issues) --
probably due to only one server being assigned the task of inserting all
the records (i.e. I look at the region servers' logs, and only one of the
servers has log entries; the rest are idle).
Any help is appreciated