Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Too many regions


Copy link to this message
-
Re: Too many regions
Everyone will tell you that handling less regions is always better.
Depending on your setup, data-size and number of records, I would say that
 1 to 5 regions per table and server is acceptable. In some setup (one big
table for example) you can see up to 100/200 regions per server, which is
the kind of maximum number you should keep in mind (Reference Guide is
talking about "a few hundreds" as far I remember).

On Fri, Jul 13, 2012 at 11:14 PM, Rob Roland <[EMAIL PROTECTED]> wrote:

> In almost every table, the rowkey is either a SHA hash, or a SHA hash and a
> timestamp, so we have a fairly even distribution of rowkeys now.
>
> Is there a best practice for number of regions of a table per server?
>  Meaning, with 5 region servers, 10 regions per table, so 170 regions per
> region server, would that be good?
>
> Thanks for the feedback,
>
> Rob
>
> On Fri, Jul 13, 2012 at 1:58 PM, Adrien Mogenet <[EMAIL PROTECTED]
> >wrote:
>
> > It can be reasonable to turn off the automatic region split if you know
> > your rowkey distribution well and you're able to ensure a great
> parallelism
> > among your regionservers "easily". (ie: manually or through HBase API).
> > Sometimes it's even the best solution to ensure the minimum number of
> > regions (Many companies are doing this). There is an example about
> > pre-splitting regions on the Reference Guide.
> >
> > About your region size, consider upgrading it to 2 GB or even more will
> > help to reduce the number of regions and storeFiles.
> >
> > On Fri, Jul 13, 2012 at 10:31 PM, Rob Roland <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi all,
> > >
> > > The HBase instance I'm managing has grown to the point that it has way
> > too
> > > many regions per server - 5 region servers with 1010 regions each on
> > HBase
> > > 0.90.4-cdh3u2.  I want to bring this region count under control. The
> > > cluster is currently running with the default region size of 256 mb,
> and
> > > the data is spread across 17 tables.   I've turned on compression for
> all
> > > the column families, which is great, as my region count is growing much
> > > slower now. I've looked through HDFS at the individual regions, and
> they
> > > seem rather small - 40-50 mb - which is not surprising due to major
> > > compactions after enabling compression.  My total hbase folder size in
> > HDFS
> > > (hadoop fs -dus /hbase) is 926,939,501,499 bytes.
> > >
> > > My question is - what's the best strategy for handling this?
> > >
> > > What I assume from reading the docs:
> > >
> > > 1. Increase the hbase.hregion.max.filesize to something more
> reasonable,
> > > like 2 GB.
> > > 2. Bring the cluster offline and merge regions.
> > >
> > > Is there a good way to determine the actual region sizes, other than
> > > manually, that way I can do the merges to end up with the most
> efficient
> > > regions, size-wise?
> > >
> > > At what point is it a good idea to turn off automatic region splits and
> > > manually manage them?
> > >
> > > Thanks,
> > >
> > > Rob Roland
> > > Senior Software Engineer
> > > Simply Measured, Inc.
> > >
> >
> >
> >
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
> >
>

--
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me