-RE: Recommended Node Size Limits
Jonathan Gray 2011-01-14, 17:39
One of the most important factors to look at is how the number of regions relates to how much heap is available for your RegionServers, and then how that will impact your expected MemStore flush sizes. More than total number of regions, this is about the number of actively written to regions.
Assuming I have randomly distributed writes evenly across all regions... If I have 2GB of heap available for all MemStores, and I have 1000 regions on each node, then this means I have ~2MB available for each region's MemStore. This is really low. You'll be flushing 2MB files and that will mean more compacting and less efficient utilization of IO.
If 4GB of heap for MemStores and 200 regions, then you get 20MB for each region. This is approaching a more sane value.
Again, things change significantly depending on the nature of your application, though it sounds like you do have very randomly distributed writes.
Working backwards, if I wanted an expected flush size of 64MB and I had 8GB of total heap, I might give 50% to the MemStores (so 4GB), and then I'd want to target (4096 / 64 = 64) regions per server. This seems low but 64 shards per server should still be sufficient. The BigTable paper describes ~100 shards/node in production. One way to get this control over the number of regions is the pre-split your table at creation time (there is an API that allows you to define your keyspace and number of splits you want). Then you can turn your split size way up, effectively preventing further splits. Again, this is for randomly distributed requests.
Regions that don't take significant amounts of writes (or any writes at all) don't need to be considered here. There's a low cost associated with serving higher numbers of inactive regions (1000s).
> -----Original Message-----
> From: Wayne [mailto:[EMAIL PROTECTED]]
> Sent: Friday, January 14, 2011 6:40 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Recommended Node Size Limits
> No suggestions? We are trying to size our production nodes and are afraid
> that 1k+ regions per node is "too" much. We spent 6 months with Cassandra
> before realizing that it does not scale *UP* at all and that more than 500GB
> per node is suicide in terms of read latency degradation and compaction (we
> had 30 hour compaction with 1TB nodes). We would prefer to not have to
> find out these "surprises" on our own with HBase. Any real world production
> experience in terms of sizing would be greatly appreciated. What criteria
> triggers more nodes other than concurrent read/write load?
> On Wed, Jan 12, 2011 at 11:27 AM, Wayne <[EMAIL PROTECTED]> wrote:
> > What is the recommended max node size in terms of data (using lzo
> > compression), region counts, and region sizes? I know there is no hard
> > limit and it should depend on the required load in terms of concurrent
> > reads/writes that the hardware can handle but I do not think that is true.
> > Based on real experience how many regions per node is too many? What
> > max region size is too big? Do production clusters have 3TB nodes with
> > thousands of very large regions? Where are the hidden cliffs that we
> should avoid?
> > Thanks.