Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)


Copy link to this message
-
Re: How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)
Yup. And its part voodoo science and gut feel.

Somehow I think that will always be the case.

On May 19, 2012, at 1:19 PM, Andrew Purtell wrote:

> It depends on workload.
>
> Right now it's up to the operator to notice how the interactions between configuration and workload play out and make adjustments as needed.
>
> With 0.94+ you can set a limit that tells the regionserver to stop splitting after N regions are hosted on it. This makes sense because if you have way more regions than you will ever have a large enough cluster to distribute them reasonably, additional splits have diminishing returns. Regions aren't a logical notion, they correspond with physical files and buffers. Consider setting N to something like 500, that's my ballpark for reasonable, totally unscientific of course.
>
>    - Andy
>
> On May 19, 2012, at 6:03 AM, Michael Segel <[EMAIL PROTECTED]> wrote:
>
>> The number of regions per RS has always been a good point of debate.
>>
>> There's a max number of 1500 (hardcoded) however, you'll see performance degrade before that limit.
>>
>> I've tried to set a goal of keeping the number of regions per RS down around 500-600 because I didn't have time to monitor the system that closely.
>> (Again this was an R&D machine where if we lost it, or it wasn't at 100% peak, I wasn't going to get tarred and feathered. :-P  )
>>
>> So if you increase your Heap, monitor your # of regions, and increase region size as needed,  you should be ok.
>>
>> On a side note... is there any correlation of the underlying block size to the region size in terms of performance? I never had time to check it out.
>>
>> Thx
>>
>> -Mike
>>
>> On May 18, 2012, at 9:05 PM, Otis Gospodnetic wrote:
>>
>>> I have a feeling Alex is raising an important issue, but maybe it's not getting attention because it's tl;dr?
>>>
>>> Andy Purtell just wrote something very related in a different thread:
>>>
>>>> "The amount of heap alloted for memstore is fixed by configuration.
>>>
>>>> HBase maintains this global limit as part of a strategy to avoid out
>>>> of memory conditions. Therefore, as the number of regions grow, the
>>>> available space for each region's memstore shrinks proportionally. If
>>>> you have a heap sized too small for region hosting demand, then when
>>>> the number of regions gets up there, HBase will be flushing constantly
>>>> tiny files and compacting endlessly."
>>>
>>> So isn't the above a problem for anyone using HBase?  More precisely, this part:
>>> "...when the number of regions gets up there, HBase will be flushing constantly tiny files and compacting endlessly."
>>>
>>> If this is not a problem, how do people work around this?  Somehow keep the number of regions mostly constant, or...?
>>>
>>>
>>> Thanks!
>>>
>>> Otis
>>> ----
>>> Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm
>>>
>>>
>>>
>>>> ________________________________
>>>> From: Alex Baranau <[EMAIL PROTECTED]>
>>>> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
>>>> Sent: Wednesday, May 9, 2012 6:02 PM
>>>> Subject: Re: About HBase Memstore Flushes
>>>>
>>>> Should I may be create a JIRA issue for that?
>>>>
>>>> Alex Baranau
>>>> ------
>>>> Sematext :: http://blog.sematext.com/
>>>>
>>>> On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> Just trying to check that I understand things correctly about configuring
>>>>> memstore flushes.
>>>>>
>>>>> Basically, there are two groups of configuraion properties (leaving out
>>>>> region pre-close flushes):
>>>>> 1. determines when flush should be triggered
>>>>> 2. determines when flush should be triggered and updates should be blocked
>>>>> during flushing
>>>>>
>>>>> 2nd one is for safety reasons: we don't want memstore to grow without a
>>>>> limit, so we forbid writes unless memstore has "bearable" size. Also we
>>>>> don't want flushed files to be too big. These properties are: