|
|
-
Re: How to avoid frequent and blocking Memstore Flushes? (was: Re: About HBase Memstore Flushes)Michael Segel 2012-05-19, 20:36
Yup. And its part voodoo science and gut feel.
Somehow I think that will always be the case. On May 19, 2012, at 1:19 PM, Andrew Purtell wrote: > It depends on workload. > > Right now it's up to the operator to notice how the interactions between configuration and workload play out and make adjustments as needed. > > With 0.94+ you can set a limit that tells the regionserver to stop splitting after N regions are hosted on it. This makes sense because if you have way more regions than you will ever have a large enough cluster to distribute them reasonably, additional splits have diminishing returns. Regions aren't a logical notion, they correspond with physical files and buffers. Consider setting N to something like 500, that's my ballpark for reasonable, totally unscientific of course. > > - Andy > > On May 19, 2012, at 6:03 AM, Michael Segel <[EMAIL PROTECTED]> wrote: > >> The number of regions per RS has always been a good point of debate. >> >> There's a max number of 1500 (hardcoded) however, you'll see performance degrade before that limit. >> >> I've tried to set a goal of keeping the number of regions per RS down around 500-600 because I didn't have time to monitor the system that closely. >> (Again this was an R&D machine where if we lost it, or it wasn't at 100% peak, I wasn't going to get tarred and feathered. :-P ) >> >> So if you increase your Heap, monitor your # of regions, and increase region size as needed, you should be ok. >> >> On a side note... is there any correlation of the underlying block size to the region size in terms of performance? I never had time to check it out. >> >> Thx >> >> -Mike >> >> On May 18, 2012, at 9:05 PM, Otis Gospodnetic wrote: >> >>> I have a feeling Alex is raising an important issue, but maybe it's not getting attention because it's tl;dr? >>> >>> Andy Purtell just wrote something very related in a different thread: >>> >>>> "The amount of heap alloted for memstore is fixed by configuration. >>> >>>> HBase maintains this global limit as part of a strategy to avoid out >>>> of memory conditions. Therefore, as the number of regions grow, the >>>> available space for each region's memstore shrinks proportionally. If >>>> you have a heap sized too small for region hosting demand, then when >>>> the number of regions gets up there, HBase will be flushing constantly >>>> tiny files and compacting endlessly." >>> >>> So isn't the above a problem for anyone using HBase? More precisely, this part: >>> "...when the number of regions gets up there, HBase will be flushing constantly tiny files and compacting endlessly." >>> >>> If this is not a problem, how do people work around this? Somehow keep the number of regions mostly constant, or...? >>> >>> >>> Thanks! >>> >>> Otis >>> ---- >>> Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm >>> >>> >>> >>>> ________________________________ >>>> From: Alex Baranau <[EMAIL PROTECTED]> >>>> To: [EMAIL PROTECTED]; [EMAIL PROTECTED] >>>> Sent: Wednesday, May 9, 2012 6:02 PM >>>> Subject: Re: About HBase Memstore Flushes >>>> >>>> Should I may be create a JIRA issue for that? >>>> >>>> Alex Baranau >>>> ------ >>>> Sematext :: http://blog.sematext.com/ >>>> >>>> On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <[EMAIL PROTECTED]>wrote: >>>> >>>>> Hi! >>>>> >>>>> Just trying to check that I understand things correctly about configuring >>>>> memstore flushes. >>>>> >>>>> Basically, there are two groups of configuraion properties (leaving out >>>>> region pre-close flushes): >>>>> 1. determines when flush should be triggered >>>>> 2. determines when flush should be triggered and updates should be blocked >>>>> during flushing >>>>> >>>>> 2nd one is for safety reasons: we don't want memstore to grow without a >>>>> limit, so we forbid writes unless memstore has "bearable" size. Also we >>>>> don't want flushed files to be too big. These properties are: |