Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)


Copy link to this message
-
Re: How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)
The number of regions per RS has always been a good point of debate.

There's a max number of 1500 (hardcoded) however, you'll see performance degrade before that limit.

I've tried to set a goal of keeping the number of regions per RS down around 500-600 because I didn't have time to monitor the system that closely.
(Again this was an R&D machine where if we lost it, or it wasn't at 100% peak, I wasn't going to get tarred and feathered. :-P  )

So if you increase your Heap, monitor your # of regions, and increase region size as needed,  you should be ok.

On a side note... is there any correlation of the underlying block size to the region size in terms of performance? I never had time to check it out.

Thx

-Mike

On May 18, 2012, at 9:05 PM, Otis Gospodnetic wrote:

> I have a feeling Alex is raising an important issue, but maybe it's not getting attention because it's tl;dr?
>
> Andy Purtell just wrote something very related in a different thread:
>
>>  "The amount of heap alloted for memstore is fixed by configuration.
>
>>   HBase maintains this global limit as part of a strategy to avoid out
>>   of memory conditions. Therefore, as the number of regions grow, the
>>   available space for each region's memstore shrinks proportionally. If
>>   you have a heap sized too small for region hosting demand, then when
>>   the number of regions gets up there, HBase will be flushing constantly
>>   tiny files and compacting endlessly."
>
> So isn't the above a problem for anyone using HBase?  More precisely, this part:
> "...when the number of regions gets up there, HBase will be flushing constantly tiny files and compacting endlessly."
>
> If this is not a problem, how do people work around this?  Somehow keep the number of regions mostly constant, or...?
>
>
> Thanks!
>
> Otis
> ----
> Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm
>
>
>
>> ________________________________
>> From: Alex Baranau <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
>> Sent: Wednesday, May 9, 2012 6:02 PM
>> Subject: Re: About HBase Memstore Flushes
>>
>> Should I may be create a JIRA issue for that?
>>
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/
>>
>> On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:
>>
>>> Hi!
>>>
>>> Just trying to check that I understand things correctly about configuring
>>> memstore flushes.
>>>
>>> Basically, there are two groups of configuraion properties (leaving out
>>> region pre-close flushes):
>>> 1. determines when flush should be triggered
>>> 2. determines when flush should be triggered and updates should be blocked
>>> during flushing
>>>
>>> 2nd one is for safety reasons: we don't want memstore to grow without a
>>> limit, so we forbid writes unless memstore has "bearable" size. Also we
>>> don't want flushed files to be too big. These properties are:
>>> * hbase.regionserver.global.memstore.upperLimit &
>>> hbase.regionserver.global.memstore.lowerLimit [1]   (1)
>>> * hbase.hregion.memstore.block.multiplier [2]
>>>
>>> 1st group (sorry for reverse order) is about triggering "regular flushes".
>>> As flushes can be performed without pausing updates, we want them to happen
>>> before conditions for "blocking updates" flushes are met. The property for
>>> configuring this is
>>> * hbase.hregion.memstore.flush.size [3]
>>> (* there are also open jira issues for per colfam settings)
>>>
>>> As we don't want to perform too frequent flushes, we want to keep this
>>> option big enough to avoid that. At the same time we want to keep it small
>>> enough so that it triggers flushing *before* the "blocking updates"
>>> flushing is triggered. This configuration is per-region, while (1) is per
>>> regionserver. So, if we had constant (more or less) number of regions per
>>> regionserver, we could choose the value in a such way that it is not too
>>> small, but small enough. However it is usual situation when regions number
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB