Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Too many regions


Copy link to this message
-
Re: Too many regions
Tables are like a loose organizational structure to allow you to have more granular per-table configurations or just for your own logical separation of data.  There aren't any best practices with regards to regions per table.  What is more important is regions per region server and regions per query-able data.  

The former is obvious, in that you don't want more than a few hundred (100-300) regions per region server.  What I mean by the latter is that you generally want to have "just enough" regions for the data you are trying to query.  If you have too little, you won't benefit from the distributed nature of HBase.  But if you have too many you will go over the recommended few hundred regions per region server.

Again, it depends on your use case—how are you loading your data, how much data, etc—but I would generally ere on the higher side.  It's easier to split large regions than it is to merge too-small regions.  

--  
Bryan Beaudreault
On Friday, July 13, 2012 at 5:14 PM, Rob Roland wrote:

> In almost every table, the rowkey is either a SHA hash, or a SHA hash and a
> timestamp, so we have a fairly even distribution of rowkeys now.
>  
> Is there a best practice for number of regions of a table per server?
> Meaning, with 5 region servers, 10 regions per table, so 170 regions per
> region server, would that be good?
>  
> Thanks for the feedback,
>  
> Rob
>  
> On Fri, Jul 13, 2012 at 1:58 PM, Adrien Mogenet <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>wrote:
>  
> > It can be reasonable to turn off the automatic region split if you know
> > your rowkey distribution well and you're able to ensure a great parallelism
> > among your regionservers "easily". (ie: manually or through HBase API).
> > Sometimes it's even the best solution to ensure the minimum number of
> > regions (Many companies are doing this). There is an example about
> > pre-splitting regions on the Reference Guide.
> >  
> > About your region size, consider upgrading it to 2 GB or even more will
> > help to reduce the number of regions and storeFiles.
> >  
> > On Fri, Jul 13, 2012 at 10:31 PM, Rob Roland <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])>
> > wrote:
> >  
> > > Hi all,
> > >  
> > > The HBase instance I'm managing has grown to the point that it has way
> > too
> > > many regions per server - 5 region servers with 1010 regions each on
> >  
> > HBase
> > > 0.90.4-cdh3u2. I want to bring this region count under control. The
> > > cluster is currently running with the default region size of 256 mb, and
> > > the data is spread across 17 tables. I've turned on compression for all
> > > the column families, which is great, as my region count is growing much
> > > slower now. I've looked through HDFS at the individual regions, and they
> > > seem rather small - 40-50 mb - which is not surprising due to major
> > > compactions after enabling compression. My total hbase folder size in
> > >  
> >  
> > HDFS
> > > (hadoop fs -dus /hbase) is 926,939,501,499 bytes.
> > >  
> > > My question is - what's the best strategy for handling this?
> > >  
> > > What I assume from reading the docs:
> > >  
> > > 1. Increase the hbase.hregion.max.filesize to something more reasonable,
> > > like 2 GB.
> > > 2. Bring the cluster offline and merge regions.
> > >  
> > > Is there a good way to determine the actual region sizes, other than
> > > manually, that way I can do the merges to end up with the most efficient
> > > regions, size-wise?
> > >  
> > > At what point is it a good idea to turn off automatic region splits and
> > > manually manage them?
> > >  
> > > Thanks,
> > >  
> > > Rob Roland
> > > Senior Software Engineer
> > > Simply Measured, Inc.
> > >  
> >  
> >  
> >  
> >  
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
> >  
>  
>  
>  
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB