Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> region size/count per regionserver

Copy link to this message
Re: region size/count per regionserver
The funny thing about tuning... What works for one situation may not work well for others.
Using the old recommendation of never exceeding 1000 R per RS, keeping it low around 100-200 and monitoring tables and changing the REgion Size on a table by table basis we are doing OK.
( of course there are other nasty bugs that kill us... But that's a different thread...)

The point is that you need to decide what makes sense for you and what trade offs you can live with...

Just my two cents...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Nov 2, 2011, at 9:10 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Do we know what would need to change in HBase in order to be able to manage more regions per regionserver?
> With 20 regions per server, one would need 300G regions to just utilize 6T of drive space.
> To utilize a regionserver/datanode with 24T drive space the region size would be an insane 1T.
> -- Lars
> ________________________________
> From: Nicolas Spiegelberg <[EMAIL PROTECTED]>
> Cc: Karthik Ranganathan <[EMAIL PROTECTED]>; Kannan Muthukkaruppan <[EMAIL PROTECTED]>
> Sent: Tuesday, November 1, 2011 3:57 PM
> Subject: Re: region size/count per regionserver
> Simple answer
> -------------
> 20 regions/server & <2000 regions/cluster is a good rule of thumb if you
> can't profile your workload yet.  You really want to ensure that
> 1) You need to limits the regions/cluster so the master can have a
> reasonable startup time & can handle all the region state transitions via
> ZK.  Most bigger companies are running 2,000 in production and achieve
> reasonable startup times (< 2 minutes for region assignment on cold
> start).  If you want to test the scalability of that algorithm beyond what
> other companies need, admin beware.
> 2) The more regions/server you have, the faster that recovery can happen
> after RS death because you can currently parallelize recovery on a
> region-granularity.  Too many regions/server and #1 starts to be a problem.
> Complicated answer
> ------------------
> More information is optimize this formula.  Additional considerations:
> 1) Are you IO-bound or CPU-bound
> 2) What is your grid topology like
> 3) What is your network hardware like
> 4) How many disks (not just size)
> 5) What is the data locality between RegionServer & DataNode
> In the Facebook case, we have 5 racks with 20 nodes each.  Servers in the
> rack are connected by 1G Eth to a switch with a 10G uplink.  We are
> network bound.  Our saturation point is mostly commonly on the top-of-rack
> switch.  With 20 regions/server, we can roughly parallelize our
> distributed log splitting within a single rack on RS death (although 2
> regions do split off-rack).  This minimizes top-of-rack traffic and
> optimized our recovery time.  Even if you are CPU-bound, log splitting
> (hence recovery time) is an IO-bound operation.  A lot of our work on
> region assignment is about maximizing data locality, even on RS death, so
> we avoid top-of-rack saturation.
> On 11/1/11 10:54 AM, "Sujee Maniyam" <[EMAIL PROTECTED]> wrote:
>> HI all,
>> My HBase cluster is 10 nodes, each node has 12core ,   48G RAM, 24TB disk,
>> 10GEthernet.
>> My region size is 1GB.
>> Any guidelines on how many regions can a RS  handle comfortably?
>> I vaguely remember reading some where to have no more than 1000 regions /
>> server; that comes to 1TB / server.  Seems pretty low for the current
>> hardware config.
>> Any rules of thumb?  experiences?
>> thanks
>> Sujee
>> http://sujee.net