Sujee Maniyam 2011-11-01, 17:54
Jean-Daniel Cryans 2011-11-01, 18:22
Sujee Maniyam 2011-11-01, 21:34
Jean-Daniel Cryans 2011-11-01, 21:40
Sujee Maniyam 2011-11-01, 21:46
Jean-Daniel Cryans 2011-11-01, 22:02
20 regions/server & <2000 regions/cluster is a good rule of thumb if you
can't profile your workload yet. You really want to ensure that
1) You need to limits the regions/cluster so the master can have a
reasonable startup time & can handle all the region state transitions via
ZK. Most bigger companies are running 2,000 in production and achieve
reasonable startup times (< 2 minutes for region assignment on cold
start). If you want to test the scalability of that algorithm beyond what
other companies need, admin beware.
2) The more regions/server you have, the faster that recovery can happen
after RS death because you can currently parallelize recovery on a
region-granularity. Too many regions/server and #1 starts to be a problem.
More information is optimize this formula. Additional considerations:
1) Are you IO-bound or CPU-bound
2) What is your grid topology like
3) What is your network hardware like
4) How many disks (not just size)
5) What is the data locality between RegionServer & DataNode
In the Facebook case, we have 5 racks with 20 nodes each. Servers in the
rack are connected by 1G Eth to a switch with a 10G uplink. We are
network bound. Our saturation point is mostly commonly on the top-of-rack
switch. With 20 regions/server, we can roughly parallelize our
distributed log splitting within a single rack on RS death (although 2
regions do split off-rack). This minimizes top-of-rack traffic and
optimized our recovery time. Even if you are CPU-bound, log splitting
(hence recovery time) is an IO-bound operation. A lot of our work on
region assignment is about maximizing data locality, even on RS death, so
we avoid top-of-rack saturation.
On 11/1/11 10:54 AM, "Sujee Maniyam" <[EMAIL PROTECTED]> wrote:
>My HBase cluster is 10 nodes, each node has 12core , 48G RAM, 24TB disk,
>My region size is 1GB.
>Any guidelines on how many regions can a RS handle comfortably?
>I vaguely remember reading some where to have no more than 1000 regions /
>server; that comes to 1TB / server. Seems pretty low for the current
>Any rules of thumb? experiences?
lars hofhansl 2011-11-03, 02:10
Nicolas Spiegelberg 2011-11-03, 02:55
Doug Meil 2011-11-03, 13:25
Michel Segel 2011-11-04, 11:37
Mikael Sitruk 2011-11-04, 12:32
Sujee Maniyam 2011-11-03, 18:32