We are pre-splitting our tables before bulk loading also but we don't use
We split manually (we did some testing and found the optimal split points)
by putting into .META table a new HRegionInfo, assigning that region
(HBaseAdmin.assign("region name")) and after you finish assigning all the
regions don't forget to clear the region cache.
I know it's a little bit "intrusive" but it works for us.
On Fri, Jan 25, 2013 at 4:45 PM, Rob Styles <[EMAIL PROTECTED]> wrote:
> I'm tuning hbase for storage of a few billion rows and, more or less, bulk
> I'm using MD5 strings as row ids to create an evenly distributed range and
> non-sequential values during loading and this is working relatively well
> for us.
> I've pre-split my tables using org.apache.hadoop.hbase.util.RegionSplitter
> from the command line and had expected it to create regions covering 00000
> - fffff as per the docs. My regions come out different though, before
> loading any data.
> With 200 regions the first region ends with 00a3d70a and the regions go up
> from there. The last region has a start key of 7f5c28c6 which is only
> half-way through the address space. This means my last region gets hot
> during loading.
> I know I must have missed something but not sure what. Any help greatly