Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)


Copy link to this message
-
Re: How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)
The number of regions per RS has always been a good point of debate.

There's a max number of 1500 (hardcoded) however, you'll see performance degrade before that limit.

I've tried to set a goal of keeping the number of regions per RS down around 500-600 because I didn't have time to monitor the system that closely.
(Again this was an R&D machine where if we lost it, or it wasn't at 100% peak, I wasn't going to get tarred and feathered. :-P  )

So if you increase your Heap, monitor your # of regions, and increase region size as needed,  you should be ok.

On a side note... is there any correlation of the underlying block size to the region size in terms of performance? I never had time to check it out.

Thx

-Mike

On May 18, 2012, at 9:05 PM, Otis Gospodnetic wrote:

> I have a feeling Alex is raising an important issue, but maybe it's not getting attention because it's tl;dr?
>
> Andy Purtell just wrote something very related in a different thread:
>
>>  "The amount of heap alloted for memstore is fixed by configuration.
>
>>   HBase maintains this global limit as part of a strategy to avoid out
>>   of memory conditions. Therefore, as the number of regions grow, the
>>   available space for each region's memstore shrinks proportionally. If
>>   you have a heap sized too small for region hosting demand, then when
>>   the number of regions gets up there, HBase will be flushing constantly
>>   tiny files and compacting endlessly."
>
> So isn't the above a problem for anyone using HBase?  More precisely, this part:
> "...when the number of regions gets up there, HBase will be flushing constantly tiny files and compacting endlessly."
>
> If this is not a problem, how do people work around this?  Somehow keep the number of regions mostly constant, or...?
>
>
> Thanks!
>
> Otis
> ----
> Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm
>
>
>
>> ________________________________
>> From: Alex Baranau <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
>> Sent: Wednesday, May 9, 2012 6:02 PM
>> Subject: Re: About HBase Memstore Flushes
>>
>> Should I may be create a JIRA issue for that?
>>
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/
>>
>> On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:
>>
>>> Hi!
>>>
>>> Just trying to check that I understand things correctly about configuring
>>> memstore flushes.
>>>
>>> Basically, there are two groups of configuraion properties (leaving out
>>> region pre-close flushes):
>>> 1. determines when flush should be triggered
>>> 2. determines when flush should be triggered and updates should be blocked
>>> during flushing
>>>
>>> 2nd one is for safety reasons: we don't want memstore to grow without a
>>> limit, so we forbid writes unless memstore has "bearable" size. Also we
>>> don't want flushed files to be too big. These properties are:
>>> * hbase.regionserver.global.memstore.upperLimit &
>>> hbase.regionserver.global.memstore.lowerLimit [1]   (1)
>>> * hbase.hregion.memstore.block.multiplier [2]
>>>
>>> 1st group (sorry for reverse order) is about triggering "regular flushes".
>>> As flushes can be performed without pausing updates, we want them to happen
>>> before conditions for "blocking updates" flushes are met. The property for
>>> configuring this is
>>> * hbase.hregion.memstore.flush.size [3]
>>> (* there are also open jira issues for per colfam settings)
>>>
>>> As we don't want to perform too frequent flushes, we want to keep this
>>> option big enough to avoid that. At the same time we want to keep it small
>>> enough so that it triggers flushing *before* the "blocking updates"
>>> flushing is triggered. This configuration is per-region, while (1) is per
>>> regionserver. So, if we had constant (more or less) number of regions per
>>> regionserver, we could choose the value in a such way that it is not too
>>> small, but small enough. However it is usual situation when regions number