Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Too many regions


Copy link to this message
-
Re: Too many regions
Everyone will tell you that handling less regions is always better.
Depending on your setup, data-size and number of records, I would say that
 1 to 5 regions per table and server is acceptable. In some setup (one big
table for example) you can see up to 100/200 regions per server, which is
the kind of maximum number you should keep in mind (Reference Guide is
talking about "a few hundreds" as far I remember).

On Fri, Jul 13, 2012 at 11:14 PM, Rob Roland <[EMAIL PROTECTED]> wrote:

> In almost every table, the rowkey is either a SHA hash, or a SHA hash and a
> timestamp, so we have a fairly even distribution of rowkeys now.
>
> Is there a best practice for number of regions of a table per server?
>  Meaning, with 5 region servers, 10 regions per table, so 170 regions per
> region server, would that be good?
>
> Thanks for the feedback,
>
> Rob
>
> On Fri, Jul 13, 2012 at 1:58 PM, Adrien Mogenet <[EMAIL PROTECTED]
> >wrote:
>
> > It can be reasonable to turn off the automatic region split if you know
> > your rowkey distribution well and you're able to ensure a great
> parallelism
> > among your regionservers "easily". (ie: manually or through HBase API).
> > Sometimes it's even the best solution to ensure the minimum number of
> > regions (Many companies are doing this). There is an example about
> > pre-splitting regions on the Reference Guide.
> >
> > About your region size, consider upgrading it to 2 GB or even more will
> > help to reduce the number of regions and storeFiles.
> >
> > On Fri, Jul 13, 2012 at 10:31 PM, Rob Roland <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi all,
> > >
> > > The HBase instance I'm managing has grown to the point that it has way
> > too
> > > many regions per server - 5 region servers with 1010 regions each on
> > HBase
> > > 0.90.4-cdh3u2.  I want to bring this region count under control. The
> > > cluster is currently running with the default region size of 256 mb,
> and
> > > the data is spread across 17 tables.   I've turned on compression for
> all
> > > the column families, which is great, as my region count is growing much
> > > slower now. I've looked through HDFS at the individual regions, and
> they
> > > seem rather small - 40-50 mb - which is not surprising due to major
> > > compactions after enabling compression.  My total hbase folder size in
> > HDFS
> > > (hadoop fs -dus /hbase) is 926,939,501,499 bytes.
> > >
> > > My question is - what's the best strategy for handling this?
> > >
> > > What I assume from reading the docs:
> > >
> > > 1. Increase the hbase.hregion.max.filesize to something more
> reasonable,
> > > like 2 GB.
> > > 2. Bring the cluster offline and merge regions.
> > >
> > > Is there a good way to determine the actual region sizes, other than
> > > manually, that way I can do the merges to end up with the most
> efficient
> > > regions, size-wise?
> > >
> > > At what point is it a good idea to turn off automatic region splits and
> > > manually manage them?
> > >
> > > Thanks,
> > >
> > > Rob Roland
> > > Senior Software Engineer
> > > Simply Measured, Inc.
> > >
> >
> >
> >
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
> >
>

--
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB