Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Pre-split Region Boundaries

Copy link to this message
Re: Pre-split Region Boundaries
We are pre-splitting our tables before bulk loading also but we don't use
the RegionSplitter.

We split manually (we did some testing and found the optimal split points)
by putting into .META table a new HRegionInfo, assigning that region
(HBaseAdmin.assign("region name")) and after you finish assigning all the
regions don't forget to clear the region cache.

I know it's a little bit "intrusive" but it works for us.

On Fri, Jan 25, 2013 at 4:45 PM, Rob Styles <[EMAIL PROTECTED]> wrote:

> Hi,
> I'm tuning hbase for storage of a few billion rows and, more or less, bulk
> loading.
> I'm using MD5 strings as row ids to create an evenly distributed range and
> non-sequential values during loading and this is working relatively well
> for us.
> I've pre-split my tables using org.apache.hadoop.hbase.util.RegionSplitter
> from the command line and had expected it to create regions covering 00000
> - fffff as per the docs. My regions come out different though, before
> loading any data.
> With 200 regions the first region ends with 00a3d70a and the regions go up
> from there. The last region has a start key of 7f5c28c6 which is only
> half-way through the address space. This means my last region gets hot
> during loading.
> I know I must have missed something but not sure what. Any help greatly
> appreciated.
> thanks
> rob