Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: HBase scanner LeaseException


Copy link to this message
-
Re: HBase scanner LeaseException
Vincent Barat 2012-11-22, 08:21
Apparently, my problem seems more related to the one exposed here:
http://www.nosql.se/tags/hbase-rpc-timeout/

I don't really understand the reason why next() on our scanners is
called less than once per 60s, and actually I suspect this is NOT
the case, since we never had any scanner timeout exception when we
were running 0.90.3, this issue appeared only with 0.92.

Anyway,  increasing hbase.rpc.timeout seems to work.

We will continue our investigation, but my guess is that there is an
issue in 0.92 related to how hbase handle scanners leases.

Best regards,

Le 21/11/12 09:23, Vincent Barat a �crit :
>
> Le 21/11/12 06:05, Stack a �crit :
>> On Tue, Nov 20, 2012 at 8:21 AM, Vincent Barat
>> <[EMAIL PROTECTED]> wrote:
>>> We have changed some parameters on our 16(!) region servers :
>>> 1GB more -Xmx,
>>> more rpc handler (from 10 to 30) longer timeout, but nothing
>>> seems to
>>> improve the response time:
>>>
>> You have taken a look at the perf chapter Vincent:
>> http://hbase.apache.org/book.html#performance
>>
>> You carried forward your old hbase-default.xml or did you remove it
>> (0.92 should have defaults in hbase-X.X.X.jar -- some defaults will
>> have changed).
> We use the new default settings for HBase, just a few changes
> (more RPC handlers and longer timeout (but this last was a bad idea).
>>> - Scans with HBase 0.92  are x3 SLOWER than with HBase 0.90.3
>> Any scan caching going on?
> yes the cache is set between 64 and 1024 depending on the need
>>> - A lot of simultaneous gets lead to a huge slow down of batch
>>> put & ramdom
>>> read response time
>>>
>> The gets are returning lots of data? (If you thread dump the
>> server at
>> this time -- see at top of the regionserver UI -- can you see
>> what we
>> are hung up on?  Are all handlers occupied?).
> We will check this...
>>> ... despite the fact that our RS CPU load is really low (10%)
>>>
>> As has been suggested earlier, perhaps up the handlers?
>>
>>
>>> Note: we have not (yet) activated MSlabs, nor direct read on HDFS.
>>>
>> MSlab will help you avoid stop-the-world GCs.  Direct read of HDFS
>> should speed up random access.
> OK, I guess we will give it a try, but on a second step.
>
> Thansk for your help
>>
>> St.Ack
>>
>>> Any idea please ? I'm really stuck on that issue.
>>>
>>> Best regards,
>>>
>>> Le 16/11/12 20:55, Vincent Barat a �crit :
>>>> Hi,
>>>>
>>>> Right now (and previously with 0.90.3) we were using the
>>>> default value
>>>> (10).
>>>> We are trying right now to increase to 30 to see if it is better.
>>>>
>>>> Thanks for your concern
>>>>
>>>> Le 16/11/12 18:13, Ted Yu a �crit :
>>>>> Vincent:
>>>>> What's the value for hbase.regionserver.handler.count ?
>>>>>
>>>>> I assume you keep the same value as that from 0.90.3
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Fri, Nov 16, 2012 at 8:14 AM, Vincent
>>>>> Barat<[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>> Le 16/11/12 01:56, Stack a �crit :
>>>>>>
>>>>>>    On Thu, Nov 15, 2012 at 5:21 AM, Guillaume
>>>>>> Perrot<[EMAIL PROTECTED]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It happens when several tables are being compacted and/or
>>>>>>>> when there
>>>>>>>> is
>>>>>>>> several scanners running.
>>>>>>>>
>>>>>>> It happens for a particular region?  Anything you can tell
>>>>>>> about the
>>>>>>> server looking in your cluster monitoring?  Is it running
>>>>>>> hot?  What
>>>>>>> do the hbase regionserver stats in UI say?  Anything
>>>>>>> interesting about
>>>>>>> compaction queues or requests?
>>>>>>>
>>>>>> Hi, thanks for your answser Stack. I will take the lead on
>>>>>> that thread
>>>>>> from now on.
>>>>>>
>>>>>> It does not happens on any particular region. Actually,
>>>>>> things get
>>>>>> better
>>>>>> now since compactions have been performed on all tables and
>>>>>> have been
>>>>>> stopped.
>>>>>>
>>>>>> Nevertheless, we face a dramatic decrease of performances
>>>>>> (especially on
>>>>>> random gets) of the overall cluster:
>>>>>>
>>>>>> Despite the fact we double our number of region servers (from