Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)


Copy link to this message
-
Re: How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)
Yup. And its part voodoo science and gut feel.

Somehow I think that will always be the case.

On May 19, 2012, at 1:19 PM, Andrew Purtell wrote:

> It depends on workload.
>
> Right now it's up to the operator to notice how the interactions between configuration and workload play out and make adjustments as needed.
>
> With 0.94+ you can set a limit that tells the regionserver to stop splitting after N regions are hosted on it. This makes sense because if you have way more regions than you will ever have a large enough cluster to distribute them reasonably, additional splits have diminishing returns. Regions aren't a logical notion, they correspond with physical files and buffers. Consider setting N to something like 500, that's my ballpark for reasonable, totally unscientific of course.
>
>    - Andy
>
> On May 19, 2012, at 6:03 AM, Michael Segel <[EMAIL PROTECTED]> wrote:
>
>> The number of regions per RS has always been a good point of debate.
>>
>> There's a max number of 1500 (hardcoded) however, you'll see performance degrade before that limit.
>>
>> I've tried to set a goal of keeping the number of regions per RS down around 500-600 because I didn't have time to monitor the system that closely.
>> (Again this was an R&D machine where if we lost it, or it wasn't at 100% peak, I wasn't going to get tarred and feathered. :-P  )
>>
>> So if you increase your Heap, monitor your # of regions, and increase region size as needed,  you should be ok.
>>
>> On a side note... is there any correlation of the underlying block size to the region size in terms of performance? I never had time to check it out.
>>
>> Thx
>>
>> -Mike
>>
>> On May 18, 2012, at 9:05 PM, Otis Gospodnetic wrote:
>>
>>> I have a feeling Alex is raising an important issue, but maybe it's not getting attention because it's tl;dr?
>>>
>>> Andy Purtell just wrote something very related in a different thread:
>>>
>>>> "The amount of heap alloted for memstore is fixed by configuration.
>>>
>>>> HBase maintains this global limit as part of a strategy to avoid out
>>>> of memory conditions. Therefore, as the number of regions grow, the
>>>> available space for each region's memstore shrinks proportionally. If
>>>> you have a heap sized too small for region hosting demand, then when
>>>> the number of regions gets up there, HBase will be flushing constantly
>>>> tiny files and compacting endlessly."
>>>
>>> So isn't the above a problem for anyone using HBase?  More precisely, this part:
>>> "...when the number of regions gets up there, HBase will be flushing constantly tiny files and compacting endlessly."
>>>
>>> If this is not a problem, how do people work around this?  Somehow keep the number of regions mostly constant, or...?
>>>
>>>
>>> Thanks!
>>>
>>> Otis
>>> ----
>>> Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm
>>>
>>>
>>>
>>>> ________________________________
>>>> From: Alex Baranau <[EMAIL PROTECTED]>
>>>> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
>>>> Sent: Wednesday, May 9, 2012 6:02 PM
>>>> Subject: Re: About HBase Memstore Flushes
>>>>
>>>> Should I may be create a JIRA issue for that?
>>>>
>>>> Alex Baranau
>>>> ------
>>>> Sematext :: http://blog.sematext.com/
>>>>
>>>> On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> Just trying to check that I understand things correctly about configuring
>>>>> memstore flushes.
>>>>>
>>>>> Basically, there are two groups of configuraion properties (leaving out
>>>>> region pre-close flushes):
>>>>> 1. determines when flush should be triggered
>>>>> 2. determines when flush should be triggered and updates should be blocked
>>>>> during flushing
>>>>>
>>>>> 2nd one is for safety reasons: we don't want memstore to grow without a
>>>>> limit, so we forbid writes unless memstore has "bearable" size. Also we
>>>>> don't want flushed files to be too big. These properties are:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB