Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulkload into empty table with configureIncrementalLoad()


Copy link to this message
-
Re: Bulkload into empty table with configureIncrementalLoad()
You need to create the table with pre-splits, see
http://hbase.apache.org/book.html#perf.writing

J-D
On Thu, Sep 19, 2013 at 9:52 AM, Dolan Antenucci <[EMAIL PROTECTED]>wrote:

> I have about 1 billion values I am trying to load into a new HBase table
> (with just one column and column family), but am running into some issues.
>  Currently I am trying to use MapReduce to import these by first converting
> them to HFiles and then using LoadIncrementalHFiles.doBulkLoad().  I also
> use HFileOutputFormat.configureIncrementalLoad() as part of my MR job.  My
> code is essentially the same as this example:
>
> https://github.com/Paschalis/HBase-Bulk-Load-Example/blob/master/src/cy/ac/ucy/paschalis/hbase/bulkimport/Driver.java
>
> The problem I'm running into is that only 1 reducer is created
> by configureIncrementalLoad(), and there is not enough space on this node
> to handle all this data.  configureIncrementalLoad() should start one
> reducer for every region the table has, so apparently the table only has 1
> region -- maybe because it is empty and brand new (my understanding of how
> regions work is not crystal clear)?  The cluster has 5 region servers, so
> I'd at least like that many reducers to handle this loading.
>
> On a side note, I also tried the command line tool, completebulkload, but
> am running into other issues with this (timeouts, possible heap issues) --
> probably due to only one server being assigned the task of inserting all
> the records (i.e. I look at the region servers' logs, and only one of the
> servers has log entries; the rest are idle).
>
> Any help is appreciated
>
> -Dolan Antenucci
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB