Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Too many regions


Copy link to this message
-
Re: Too many regions
It can be reasonable to turn off the automatic region split if you know
your rowkey distribution well and you're able to ensure a great parallelism
among your regionservers "easily". (ie: manually or through HBase API).
Sometimes it's even the best solution to ensure the minimum number of
regions (Many companies are doing this). There is an example about
pre-splitting regions on the Reference Guide.

About your region size, consider upgrading it to 2 GB or even more will
help to reduce the number of regions and storeFiles.

On Fri, Jul 13, 2012 at 10:31 PM, Rob Roland <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> The HBase instance I'm managing has grown to the point that it has way too
> many regions per server - 5 region servers with 1010 regions each on HBase
> 0.90.4-cdh3u2.  I want to bring this region count under control. The
> cluster is currently running with the default region size of 256 mb, and
> the data is spread across 17 tables.   I've turned on compression for all
> the column families, which is great, as my region count is growing much
> slower now. I've looked through HDFS at the individual regions, and they
> seem rather small - 40-50 mb - which is not surprising due to major
> compactions after enabling compression.  My total hbase folder size in HDFS
> (hadoop fs -dus /hbase) is 926,939,501,499 bytes.
>
> My question is - what's the best strategy for handling this?
>
> What I assume from reading the docs:
>
> 1. Increase the hbase.hregion.max.filesize to something more reasonable,
> like 2 GB.
> 2. Bring the cluster offline and merge regions.
>
> Is there a good way to determine the actual region sizes, other than
> manually, that way I can do the merges to end up with the most efficient
> regions, size-wise?
>
> At what point is it a good idea to turn off automatic region splits and
> manually manage them?
>
> Thanks,
>
> Rob Roland
> Senior Software Engineer
> Simply Measured, Inc.
>

--
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB