I'm looking for more detailed advice about how many regions a table should
run. Disabling automatic splits (often hand-in-hand with disabling
automatic compactions) is often described as advanced practice, at least
when guaranteeing latency SLAs. Which begs the question: how many regions
should I have? Surely this depends on both the shape of your data and
expected workload. I've seen "10-20 Regions per RS" thrown around as a
stock answer. My question is: why? Presumably that's 10-20 regions per RS
for all tables rather than per-table. That advice is centered around a
regular region size, but surely distribution of ops/load matters more. But
still, where does 10-20 come from? Is it a calculation vs the number of
cores on the RS, like advice given around parallelizing builds? If so, how
many cores are we assuming the RS has? Is it a calculation vs the amount of
RAM available? Is 20 regions based on a trade-off between static
allocations and per-region memory overhead? Does 10-20 become 5-15 in a
memory-restricted environment and bump to 20-40 when more RAM is available?
Does it have to do with the number of spindles available on the machine?
Threads like this one  give some hint about how the big players work.
However, that advice looks heavily influenced by concerns when there are
1000's of regions to manage. How does advice for larger clusters (>50
nodes) differ from smaller clusters (<20 nodes)?
Jean-Daniel Cryans 2012-07-06, 21:22
Jack Levin 2012-07-06, 21:27
Jean-Daniel Cryans 2012-07-06, 21:35