Re: X3 slow down after moving from HBase 0.90.3 to HBase 0.92.1
Vincent Barat 2012-11-21, 09:02

I've checked my 30 RPC handler, they are all in a WAITING state:

Here is some extract for one of our RS (this is similar to all of them):

requestsPerSecond=593, numberOfOnlineRegions=584,
numberOfStores=1147, numberOfStorefiles=1980,
storefileIndexSizeMB=15, rootIndexSizeKB=16219,
totalStaticIndexSizeKB=246127, totalStaticBloomSizeKB=12936,
memstoreSizeMB=1421, readRequestsCount=633241097,
writeRequestsCount=9375846, compactionQueueSize=0, flushQueueSize=0,
usedHeapMB=3042, maxHeapMB=4591, blockCacheSizeMB=890.19,
blockCacheFreeMB=257.65, blockCacheCount=14048,
blockCacheHitCount=5854936149, blockCacheMissCount=14761288,
blockCacheEvictedCount=4870523, blockCacheHitRatio=99%,
blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=29

Le 21/11/12 05:53, Alok Singh a �crit :
> Do your PUTs and GETs have small amounts of data? If yes, then you can
> increase the number of handlers.
> We have a 8-node cluster on 0.92.1, and these are some of the setting
> we changed from 0.90.4
> hbase.regionserver.handler.count = 150
> hbase.hregion.max.filesize=2147483648 (2GB)
> The regions servers are run with a 16GB heap (-Xmx16000M)
> With these settings, at peak we can handle ~2K concurrent clients.
> Alok
> On Tue, Nov 20, 2012 at 8:21 AM, Vincent Barat <[EMAIL PROTECTED]> wrote:
>> Hi,
>> We have changed some parameters on our 16(!) region servers : 1GB more -Xmx,
>> more rpc handler (from 10 to 30) longer timeout, but nothing seems to
>> improve the response time:
>> - Scans with HBase 0.92  are x3 SLOWER than with HBase 0.90.3
>> - A lot of simultaneous gets lead to a huge slow down of batch put & ramdom
>> read response time
>> ... despite the fact that our RS CPU load is really low (10%)
>> Note: we have not (yet) activated MSlabs, nor direct read on HDFS.
>> Any idea please ? I'm really stuck on that issue.
>> Best regards,
>> Le 16/11/12 20:55, Vincent Barat a �crit :
>>> Hi,
>>> Right now (and previously with 0.90.3) we were using the default value
>>> (10).
>>> We are trying right now to increase to 30 to see if it is better.
>>> Thanks for your concern
>>> Le 16/11/12 18:13, Ted Yu a �crit :
>>>> Vincent:
>>>> What's the value for hbase.regionserver.handler.count ?
>>>> I assume you keep the same value as that from 0.90.3
>>>> Thanks
>>>> On Fri, Nov 16, 2012 at 8:14 AM, Vincent
>>>> Barat<[EMAIL PROTECTED]>wrote:
>>>>> Le 16/11/12 01:56, Stack a �crit :
>>>>>    On Thu, Nov 15, 2012 at 5:21 AM, Guillaume Perrot<[EMAIL PROTECTED]>
>>>>>> wrote:
>>>>>>> It happens when several tables are being compacted and/or when there
>>>>>>> is
>>>>>>> several scanners running.
>>>>>> It happens for a particular region?  Anything you can tell about the
>>>>>> server looking in your cluster monitoring?  Is it running hot?  What
>>>>>> do the hbase regionserver stats in UI say?  Anything interesting about
>>>>>> compaction queues or requests?
>>>>> Hi, thanks for your answser Stack. I will take the lead on that thread
>>>>> from now on.
>>>>> It does not happens on any particular region. Actually, things get
>>>>> better
>>>>> now since compactions have been performed on all tables and have been
>>>>> stopped.
>>>>> Nevertheless, we face a dramatic decrease of performances (especially on
>>>>> random gets) of the overall cluster:
>>>>> Despite the fact we double our number of region servers (from 8 to 16)
>>>>> and
>>>>> despite the fact that these region server CPU load are just about 10% to
>>>>> 30%, performances are really bad : very often an light increase of
>>>>> request
>>>>> lead to a clients locked on request, very long response time. It looks
>>>>> like
>>>>> a contention / deadlock somewhere in the HBase client and C code.
>>>>>> If you look at the thread dump all handlers are occupied serving
>>>>>> requests?  These timedout requests couldn't get into the server?
