Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Bulkload into empty table with configureIncrementalLoad()


Copy link to this message
-
Re: Bulkload into empty table with configureIncrementalLoad()
Jean-Daniel Cryans 2013-09-19, 16:55
You need to create the table with pre-splits, see
http://hbase.apache.org/book.html#perf.writing

J-D
On Thu, Sep 19, 2013 at 9:52 AM, Dolan Antenucci <[EMAIL PROTECTED]>wrote:

> I have about 1 billion values I am trying to load into a new HBase table
> (with just one column and column family), but am running into some issues.
>  Currently I am trying to use MapReduce to import these by first converting
> them to HFiles and then using LoadIncrementalHFiles.doBulkLoad().  I also
> use HFileOutputFormat.configureIncrementalLoad() as part of my MR job.  My
> code is essentially the same as this example:
>
> https://github.com/Paschalis/HBase-Bulk-Load-Example/blob/master/src/cy/ac/ucy/paschalis/hbase/bulkimport/Driver.java
>
> The problem I'm running into is that only 1 reducer is created
> by configureIncrementalLoad(), and there is not enough space on this node
> to handle all this data.  configureIncrementalLoad() should start one
> reducer for every region the table has, so apparently the table only has 1
> region -- maybe because it is empty and brand new (my understanding of how
> regions work is not crystal clear)?  The cluster has 5 region servers, so
> I'd at least like that many reducers to handle this loading.
>
> On a side note, I also tried the command line tool, completebulkload, but
> am running into other issues with this (timeouts, possible heap issues) --
> probably due to only one server being assigned the task of inserting all
> the records (i.e. I look at the region servers' logs, and only one of the
> servers has log entries; the rest are idle).
>
> Any help is appreciated
>
> -Dolan Antenucci
>