Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - HBase read perfomnance and HBase client


Copy link to this message
-
Re: HBase read perfomnance and HBase client
lars hofhansl 2013-08-01, 04:33
Would be interesting to profile MultiGet. With RTT of 0.1ms, the internal RS friction is probably the main contributor.
In fact MultiGet just loops over the set at the RS and calls single gets on the various regions.

Each Get needs to reseek into the block (even when it is cached, since KVs have variable size).

There are HBASE-6136 and HBASE-8362.
-- Lars

________________________________
From: Vladimir Rodionov <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
Sent: Wednesday, July 31, 2013 7:27 PM
Subject: Re: HBase read perfomnance and HBase client
Some final numbers :

Test config:

HBase 0.94.6
blockcache=true, block size = 64K, KV size = 62 bytes (raw).

5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
1 RS Server: the same config.

Local network with ping between hosts: 0.1 ms
1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
threads, IO pool size and other settings.
2. HBase server was able to sustain 170K per sec (with 64K block size). All
from block cache. KV size = 62 bytes (very small). This is for single Get
op, 60 threads per client, 5 clients (on different hosts)
3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
tested: 30, 100. The same performance absolutely as with batch size = 1.
Multi get has some internal issues on RegionServer side. May be excessive
locking or some thing else.

On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
<[EMAIL PROTECTED]>wrote:

> 1. SCR are enabled
> 2. Single Configuration for all table did not work well, but I will try it
> again
> 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
>
>
> On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
>> RTT is bad, right? Are you seeing ~40ms latencies?
>>
>> This thread has gotten confusing.
>>
>> I would try these:
>> * one Configuration for all tables. Or even use a single
>> HConnection/Threadpool and use the HTable(byte[], HConnection,
>> ExecutorService) constructor
>> * disable Nagle's: set both ipc.server.tcpnodelay and
>> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and*
>> server)
>> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
>> * enable short circuit reads (details depend on exact version of Hadoop).
>> Google will help :)
>>
>> -- Lars
>>
>>
>> ----- Original Message -----
>> From: Vladimir Rodionov <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Cc:
>> Sent: Tuesday, July 30, 2013 1:30 PM
>> Subject: Re: HBase read perfomnance and HBase client
>>
>> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
>> thread performance and high latency ( 0.8ms in local network)?
>>
>>
>> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
>> <[EMAIL PROTECTED]>wrote:
>>
>> > One more observation: One Configuration instance per HTable gives 50%
>> > boost as compared to single Configuration object for all HTable's - from
>> > 20K to 30K
>> >
>> >
>> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
>> [EMAIL PROTECTED]
>> > > wrote:
>> >
>> >> This thread dump has been taken when client was sending 60 requests in
>> >> parallel (at least, in theory). There are 50 server handler threads.
>> >>
>> >>
>> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
>> >> [EMAIL PROTECTED]> wrote:
>> >>
>> >>> Sure, here it is:
>> >>>
>> >>> http://pastebin.com/8TjyrKRT
>> >>>
>> >>> epoll is not only to read/write HDFS but to connect/listen to clients
>> as
>> >>> well?
>> >>>
>> >>>
>> >>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
>> >>> [EMAIL PROTECTED]> wrote:
>> >>>
>> >>>> Can you show us what the thread dump looks like when the threads are
>> >>>> BLOCKED? There aren't that many locks on the read path when reading
>> >>>> out of the block cache, and epoll would only happen if you need to
>> hit
>> >>>> HDFS, which you're saying is not happening.