Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> region size/count per regionserver


+
Sujee Maniyam 2011-11-01, 17:54
+
Jean-Daniel Cryans 2011-11-01, 18:22
+
Sujee Maniyam 2011-11-01, 21:34
+
Jean-Daniel Cryans 2011-11-01, 21:40
+
Sujee Maniyam 2011-11-01, 21:46
+
Jean-Daniel Cryans 2011-11-01, 22:02
Copy link to this message
-
Re: region size/count per regionserver
Simple answer
-------------
20 regions/server & <2000 regions/cluster is a good rule of thumb if you
can't profile your workload yet.  You really want to ensure that

1) You need to limits the regions/cluster so the master can have a
reasonable startup time & can handle all the region state transitions via
ZK.  Most bigger companies are running 2,000 in production and achieve
reasonable startup times (< 2 minutes for region assignment on cold
start).  If you want to test the scalability of that algorithm beyond what
other companies need, admin beware.
2) The more regions/server you have, the faster that recovery can happen
after RS death because you can currently parallelize recovery on a
region-granularity.  Too many regions/server and #1 starts to be a problem.

Complicated answer
------------------
More information is optimize this formula.  Additional considerations:

1) Are you IO-bound or CPU-bound
2) What is your grid topology like
3) What is your network hardware like
4) How many disks (not just size)
5) What is the data locality between RegionServer & DataNode

In the Facebook case, we have 5 racks with 20 nodes each.  Servers in the
rack are connected by 1G Eth to a switch with a 10G uplink.  We are
network bound.  Our saturation point is mostly commonly on the top-of-rack
switch.  With 20 regions/server, we can roughly parallelize our
distributed log splitting within a single rack on RS death (although 2
regions do split off-rack).  This minimizes top-of-rack traffic and
optimized our recovery time.  Even if you are CPU-bound, log splitting
(hence recovery time) is an IO-bound operation.  A lot of our work on
region assignment is about maximizing data locality, even on RS death, so
we avoid top-of-rack saturation.
On 11/1/11 10:54 AM, "Sujee Maniyam" <[EMAIL PROTECTED]> wrote:

>HI all,
>My HBase cluster is 10 nodes, each node has 12core ,   48G RAM, 24TB disk,
>10GEthernet.
>My region size is 1GB.
>
>Any guidelines on how many regions can a RS  handle comfortably?
>I vaguely remember reading some where to have no more than 1000 regions /
>server; that comes to 1TB / server.  Seems pretty low for the current
>hardware config.
>
>Any rules of thumb?  experiences?
>
>thanks
>Sujee
>
>http://sujee.net
+
lars hofhansl 2011-11-03, 02:10
+
Nicolas Spiegelberg 2011-11-03, 02:55
+
Doug Meil 2011-11-03, 13:25
+
Michel Segel 2011-11-04, 11:37
+
Mikael Sitruk 2011-11-04, 12:32
+
Sujee Maniyam 2011-11-03, 18:32