It can be reasonable to turn off the automatic region split if you know
your rowkey distribution well and you're able to ensure a great parallelism
among your regionservers "easily". (ie: manually or through HBase API).
Sometimes it's even the best solution to ensure the minimum number of
regions (Many companies are doing this). There is an example about
pre-splitting regions on the Reference Guide.
About your region size, consider upgrading it to 2 GB or even more will
help to reduce the number of regions and storeFiles.
On Fri, Jul 13, 2012 at 10:31 PM, Rob Roland <[EMAIL PROTECTED]> wrote:
> Hi all,
> The HBase instance I'm managing has grown to the point that it has way too
> many regions per server - 5 region servers with 1010 regions each on HBase
> 0.90.4-cdh3u2. I want to bring this region count under control. The
> cluster is currently running with the default region size of 256 mb, and
> the data is spread across 17 tables. I've turned on compression for all
> the column families, which is great, as my region count is growing much
> slower now. I've looked through HDFS at the individual regions, and they
> seem rather small - 40-50 mb - which is not surprising due to major
> compactions after enabling compression. My total hbase folder size in HDFS
> (hadoop fs -dus /hbase) is 926,939,501,499 bytes.
> My question is - what's the best strategy for handling this?
> What I assume from reading the docs:
> 1. Increase the hbase.hregion.max.filesize to something more reasonable,
> like 2 GB.
> 2. Bring the cluster offline and merge regions.
> Is there a good way to determine the actual region sizes, other than
> manually, that way I can do the merges to end up with the most efficient
> regions, size-wise?
> At what point is it a good idea to turn off automatic region splits and
> manually manage them?
> Rob Roland
> Senior Software Engineer
> Simply Measured, Inc.