|
|
-
Re: How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)Michael Segel 2012-05-19, 13:03
The number of regions per RS has always been a good point of debate.
There's a max number of 1500 (hardcoded) however, you'll see performance degrade before that limit. I've tried to set a goal of keeping the number of regions per RS down around 500-600 because I didn't have time to monitor the system that closely. (Again this was an R&D machine where if we lost it, or it wasn't at 100% peak, I wasn't going to get tarred and feathered. :-P ) So if you increase your Heap, monitor your # of regions, and increase region size as needed, you should be ok. On a side note... is there any correlation of the underlying block size to the region size in terms of performance? I never had time to check it out. Thx -Mike On May 18, 2012, at 9:05 PM, Otis Gospodnetic wrote: > I have a feeling Alex is raising an important issue, but maybe it's not getting attention because it's tl;dr? > > Andy Purtell just wrote something very related in a different thread: > >> "The amount of heap alloted for memstore is fixed by configuration. > >> HBase maintains this global limit as part of a strategy to avoid out >> of memory conditions. Therefore, as the number of regions grow, the >> available space for each region's memstore shrinks proportionally. If >> you have a heap sized too small for region hosting demand, then when >> the number of regions gets up there, HBase will be flushing constantly >> tiny files and compacting endlessly." > > So isn't the above a problem for anyone using HBase? More precisely, this part: > "...when the number of regions gets up there, HBase will be flushing constantly tiny files and compacting endlessly." > > If this is not a problem, how do people work around this? Somehow keep the number of regions mostly constant, or...? > > > Thanks! > > Otis > ---- > Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm > > > >> ________________________________ >> From: Alex Baranau <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED]; [EMAIL PROTECTED] >> Sent: Wednesday, May 9, 2012 6:02 PM >> Subject: Re: About HBase Memstore Flushes >> >> Should I may be create a JIRA issue for that? >> >> Alex Baranau >> ------ >> Sematext :: http://blog.sematext.com/ >> >> On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <[EMAIL PROTECTED]>wrote: >> >>> Hi! >>> >>> Just trying to check that I understand things correctly about configuring >>> memstore flushes. >>> >>> Basically, there are two groups of configuraion properties (leaving out >>> region pre-close flushes): >>> 1. determines when flush should be triggered >>> 2. determines when flush should be triggered and updates should be blocked >>> during flushing >>> >>> 2nd one is for safety reasons: we don't want memstore to grow without a >>> limit, so we forbid writes unless memstore has "bearable" size. Also we >>> don't want flushed files to be too big. These properties are: >>> * hbase.regionserver.global.memstore.upperLimit & >>> hbase.regionserver.global.memstore.lowerLimit [1] (1) >>> * hbase.hregion.memstore.block.multiplier [2] >>> >>> 1st group (sorry for reverse order) is about triggering "regular flushes". >>> As flushes can be performed without pausing updates, we want them to happen >>> before conditions for "blocking updates" flushes are met. The property for >>> configuring this is >>> * hbase.hregion.memstore.flush.size [3] >>> (* there are also open jira issues for per colfam settings) >>> >>> As we don't want to perform too frequent flushes, we want to keep this >>> option big enough to avoid that. At the same time we want to keep it small >>> enough so that it triggers flushing *before* the "blocking updates" >>> flushing is triggered. This configuration is per-region, while (1) is per >>> regionserver. So, if we had constant (more or less) number of regions per >>> regionserver, we could choose the value in a such way that it is not too >>> small, but small enough. However it is usual situation when regions number |