Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - experiencing high latency for few reads in HBase


Copy link to this message
-
Re: experiencing high latency for few reads in HBase
Kiru Pakkirisamy 2013-08-29, 18:33
But locality index should not matter right if you are in IN_MEMORY most and you are running the test after  a few runs to make sure they are already in IN_MEMORY  (ie blockCacheHit is high or blockCacheMiss is low)  (?) 

Regards,
- kiru
Kiru Pakkirisamy | webcloudtech.wordpress.com
________________________________
 From: Vladimir Rodionov <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Thursday, August 29, 2013 11:11 AM
Subject: RE: experiencing high latency for few reads in HBase
 

Usually, either cluster restart or major compaction helps improving locality index.
There is an issue in region assignment after table disable/enable in 0.94.x (x <11) which
breaks HDFS locality. Fixed in 0.94.11

You can write your own routine to manually "localize" particular table using public HBase Client API.

But this won't help you to stay withing 1 sec anyway.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Saurabh Yahoo [[EMAIL PROTECTED]]
Sent: Thursday, August 29, 2013 10:52 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: experiencing high latency for few reads in HBase

Thanks Vlad.

Quick question. I notice hdfsBlocksLocalityIndex is around 50 in all region servers.

Does that could be a problem? If it is, how to solve that? We already ran the major compaction after ingesting the data.

Thanks,
Saurabh.

On Aug 29, 2013, at 12:17 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote:

> Yes. HBase won't guarantee strict sub-second latency.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [EMAIL PROTECTED]
>
> ________________________________________
> From: Saurabh Yahoo [[EMAIL PROTECTED]]
> Sent: Thursday, August 29, 2013 2:49 AM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: experiencing high latency for few reads in HBase
>
> Hi Vlad,
>
> We do have strict latency requirement as it is financial data requiring direct access from clients.
>
> Are you saying that it is not possible to achieve sub second latency using hbase (because it is based on java.) ?
>
>
>
>
>
>
>
> On Aug 28, 2013, at 8:10 PM, Vladimir Rodionov <[EMAIL PROTECTED]> wrote:
>
>> Increasing Java heap size will make latency worse, actually.
>> You can't guarantee 1 sec max latency if run Java app (unless your heap size is much less than 1GB).
>> I have never heard about strict maximum latency limit. Usually , its 99% , 99.9 or 99.99% query percentiles.
>>
>> You can greatly reduce your 99.xxx% percentile latency by storing you data in 2 replicas to two different region servers.
>> Issue two read operations to those two region servers in parallel and get the first response. Probability theory states that  probability
>> of two independent events (slow requests) is  the product of event's probabilities themselves.
>>
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: [EMAIL PROTECTED]
>>
>> ________________________________________
>> From: Saurabh Yahoo [[EMAIL PROTECTED]]
>> Sent: Wednesday, August 28, 2013 4:18 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: experiencing high latency for few reads in HBase
>>
>> Thanks Kiru,
>>
>> Scan is not an option for our use cases.  Our read is pretty random.
>>
>> Any other suggestion to bring down the latency.
>>
>> Thanks,
>> Saurabh.
>>
>>
>> On Aug 28, 2013, at 7:01 PM, Kiru Pakkirisamy <[EMAIL PROTECTED]> wrote:
>>
>>> Saurabh, we are able to 600K rowxcolumns in 400 msec. We have put what was a 40million row table as 400K rows and columns. We Get about 100 of the rows from this 400K , do quite a bit of calculations in the coprocessor (almost a group-order by) and return in this time.
>>> Maybe should consider replacing the MultiGets with Scan with Filter. I like the FuzzyRowFilter even though you might need to match with exact key. It works only with fixed length key.