Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulkload into empty table with configureIncrementalLoad()


Copy link to this message
-
Re: Bulkload into empty table with configureIncrementalLoad()
Thanks J-D.  Any recommendations on how to determine what splits to use?
 For the keys I'm using strings, so wasn't sure what to put for my startKey
and endKey. For number of regions, I have a table pre-populated with the
same data (not using bulk load), so I can see that it has 68 regions.
On Thu, Sep 19, 2013 at 12:55 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> You need to create the table with pre-splits, see
> http://hbase.apache.org/book.html#perf.writing
>
> J-D
>
>
> On Thu, Sep 19, 2013 at 9:52 AM, Dolan Antenucci <[EMAIL PROTECTED]
> >wrote:
>
> > I have about 1 billion values I am trying to load into a new HBase table
> > (with just one column and column family), but am running into some
> issues.
> >  Currently I am trying to use MapReduce to import these by first
> converting
> > them to HFiles and then using LoadIncrementalHFiles.doBulkLoad().  I also
> > use HFileOutputFormat.configureIncrementalLoad() as part of my MR job.
>  My
> > code is essentially the same as this example:
> >
> >
> https://github.com/Paschalis/HBase-Bulk-Load-Example/blob/master/src/cy/ac/ucy/paschalis/hbase/bulkimport/Driver.java
> >
> > The problem I'm running into is that only 1 reducer is created
> > by configureIncrementalLoad(), and there is not enough space on this node
> > to handle all this data.  configureIncrementalLoad() should start one
> > reducer for every region the table has, so apparently the table only has
> 1
> > region -- maybe because it is empty and brand new (my understanding of
> how
> > regions work is not crystal clear)?  The cluster has 5 region servers, so
> > I'd at least like that many reducers to handle this loading.
> >
> > On a side note, I also tried the command line tool, completebulkload, but
> > am running into other issues with this (timeouts, possible heap issues)
> --
> > probably due to only one server being assigned the task of inserting all
> > the records (i.e. I look at the region servers' logs, and only one of the
> > servers has log entries; the rest are idle).
> >
> > Any help is appreciated
> >
> > -Dolan Antenucci
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB