Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)


Copy link to this message
-
How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)
I have a feeling Alex is raising an important issue, but maybe it's not getting attention because it's tl;dr?

Andy Purtell just wrote something very related in a different thread:

> "The amount of heap alloted for memstore is fixed by configuration.

>  HBase maintains this global limit as part of a strategy to avoid out
>  of memory conditions. Therefore, as the number of regions grow, the
>  available space for each region's memstore shrinks proportionally. If
>  you have a heap sized too small for region hosting demand, then when
>  the number of regions gets up there, HBase will be flushing constantly
>  tiny files and compacting endlessly."

So isn't the above a problem for anyone using HBase?  More precisely, this part:
"...when the number of regions gets up there, HBase will be flushing constantly tiny files and compacting endlessly."

If this is not a problem, how do people work around this?  Somehow keep the number of regions mostly constant, or...?
Thanks!

Otis
----
Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm 

>________________________________
> From: Alex Baranau <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
>Sent: Wednesday, May 9, 2012 6:02 PM
>Subject: Re: About HBase Memstore Flushes
>
>Should I may be create a JIRA issue for that?
>
>Alex Baranau
>------
>Sematext :: http://blog.sematext.com/
>
>On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:
>
>> Hi!
>>
>> Just trying to check that I understand things correctly about configuring
>> memstore flushes.
>>
>> Basically, there are two groups of configuraion properties (leaving out
>> region pre-close flushes):
>> 1. determines when flush should be triggered
>> 2. determines when flush should be triggered and updates should be blocked
>> during flushing
>>
>> 2nd one is for safety reasons: we don't want memstore to grow without a
>> limit, so we forbid writes unless memstore has "bearable" size. Also we
>> don't want flushed files to be too big. These properties are:
>> * hbase.regionserver.global.memstore.upperLimit &
>> hbase.regionserver.global.memstore.lowerLimit [1]   (1)
>> * hbase.hregion.memstore.block.multiplier [2]
>>
>> 1st group (sorry for reverse order) is about triggering "regular flushes".
>> As flushes can be performed without pausing updates, we want them to happen
>> before conditions for "blocking updates" flushes are met. The property for
>> configuring this is
>> * hbase.hregion.memstore.flush.size [3]
>> (* there are also open jira issues for per colfam settings)
>>
>> As we don't want to perform too frequent flushes, we want to keep this
>> option big enough to avoid that. At the same time we want to keep it small
>> enough so that it triggers flushing *before* the "blocking updates"
>> flushing is triggered. This configuration is per-region, while (1) is per
>> regionserver. So, if we had constant (more or less) number of regions per
>> regionserver, we could choose the value in a such way that it is not too
>> small, but small enough. However it is usual situation when regions number
>> assigned to regionserver varies a lot during cluster life. And we don't
>> want to adjust it over time (which requires RSs restarts).
>>
>> Does thinking above make sense to you? If yes, then here are the questions:
>>
>> A. is it a goal to have more or less constant regions number per
>> regionserver? Can anyone share their experience if that is achievable?
>> B. or should there be any config options for setting up triggering flushes
>> based on regionserver state (not just individual regions or stores)? E.g.:
>>     B.1 given setting X%, trigger flush of biggest memstore (or whatever
>> is logic for selecting memstore to flush) when memstore takes up X% of heap
>> (similar to (1), but triggers flushing when there's no need to block
>> updates yet)
>>     B.2 any other which takes into account regions number
>>
>> Thoughts?
>>
>> Alex Baranau
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB