-Re: How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)
Michael Segel 2012-05-19, 13:03
The number of regions per RS has always been a good point of debate.
There's a max number of 1500 (hardcoded) however, you'll see performance degrade before that limit.
I've tried to set a goal of keeping the number of regions per RS down around 500-600 because I didn't have time to monitor the system that closely.
(Again this was an R&D machine where if we lost it, or it wasn't at 100% peak, I wasn't going to get tarred and feathered. :-P )
So if you increase your Heap, monitor your # of regions, and increase region size as needed, you should be ok.
On a side note... is there any correlation of the underlying block size to the region size in terms of performance? I never had time to check it out.
On May 18, 2012, at 9:05 PM, Otis Gospodnetic wrote:
> I have a feeling Alex is raising an important issue, but maybe it's not getting attention because it's tl;dr?
> Andy Purtell just wrote something very related in a different thread:
>> "The amount of heap alloted for memstore is fixed by configuration.
>> HBase maintains this global limit as part of a strategy to avoid out
>> of memory conditions. Therefore, as the number of regions grow, the
>> available space for each region's memstore shrinks proportionally. If
>> you have a heap sized too small for region hosting demand, then when
>> the number of regions gets up there, HBase will be flushing constantly
>> tiny files and compacting endlessly."
> So isn't the above a problem for anyone using HBase? More precisely, this part:
> "...when the number of regions gets up there, HBase will be flushing constantly tiny files and compacting endlessly."
> If this is not a problem, how do people work around this? Somehow keep the number of regions mostly constant, or...?
> Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm
>> From: Alex Baranau <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
>> Sent: Wednesday, May 9, 2012 6:02 PM
>> Subject: Re: About HBase Memstore Flushes
>> Should I may be create a JIRA issue for that?
>> Alex Baranau
>> Sematext :: http://blog.sematext.com/
>> On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:
>>> Just trying to check that I understand things correctly about configuring
>>> memstore flushes.
>>> Basically, there are two groups of configuraion properties (leaving out
>>> region pre-close flushes):
>>> 1. determines when flush should be triggered
>>> 2. determines when flush should be triggered and updates should be blocked
>>> during flushing
>>> 2nd one is for safety reasons: we don't want memstore to grow without a
>>> limit, so we forbid writes unless memstore has "bearable" size. Also we
>>> don't want flushed files to be too big. These properties are:
>>> * hbase.regionserver.global.memstore.upperLimit &
>>> hbase.regionserver.global.memstore.lowerLimit  (1)
>>> * hbase.hregion.memstore.block.multiplier 
>>> 1st group (sorry for reverse order) is about triggering "regular flushes".
>>> As flushes can be performed without pausing updates, we want them to happen
>>> before conditions for "blocking updates" flushes are met. The property for
>>> configuring this is
>>> * hbase.hregion.memstore.flush.size 
>>> (* there are also open jira issues for per colfam settings)
>>> As we don't want to perform too frequent flushes, we want to keep this
>>> option big enough to avoid that. At the same time we want to keep it small
>>> enough so that it triggers flushing *before* the "blocking updates"
>>> flushing is triggered. This configuration is per-region, while (1) is per
>>> regionserver. So, if we had constant (more or less) number of regions per
>>> regionserver, we could choose the value in a such way that it is not too
>>> small, but small enough. However it is usual situation when regions number