Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - HBase read perfomnance and HBase client


Copy link to this message
-
Re: HBase read perfomnance and HBase client
Vladimir Rodionov 2013-08-01, 04:57
Smaller block size (32K) does not give any performance gain and this is
strange, to say the least.
On Wed, Jul 31, 2013 at 9:33 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Would be interesting to profile MultiGet. With RTT of 0.1ms, the internal
> RS friction is probably the main contributor.
> In fact MultiGet just loops over the set at the RS and calls single gets
> on the various regions.
>
> Each Get needs to reseek into the block (even when it is cached, since KVs
> have variable size).
>
> There are HBASE-6136 and HBASE-8362.
>
>
> -- Lars
>
> ________________________________
> From: Vladimir Rodionov <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Wednesday, July 31, 2013 7:27 PM
> Subject: Re: HBase read perfomnance and HBase client
>
>
> Some final numbers :
>
> Test config:
>
> HBase 0.94.6
> blockcache=true, block size = 64K, KV size = 62 bytes (raw).
>
> 5 Clients: 96GB, 16(32) CPUs (2.2Ghz), CentOS 5.7
> 1 RS Server: the same config.
>
> Local network with ping between hosts: 0.1 ms
>
>
> 1. HBase client hits the wall at ~ 50K per sec regardless of # of CPU,
> threads, IO pool size and other settings.
> 2. HBase server was able to sustain 170K per sec (with 64K block size). All
> from block cache. KV size = 62 bytes (very small). This is for single Get
> op, 60 threads per client, 5 clients (on different hosts)
> 3. Multi - get hits the wall at the same 170K-200K per sec. Batch size
> tested: 30, 100. The same performance absolutely as with batch size = 1.
> Multi get has some internal issues on RegionServer side. May be excessive
> locking or some thing else.
>
>
>
>
>
> On Tue, Jul 30, 2013 at 2:01 PM, Vladimir Rodionov
> <[EMAIL PROTECTED]>wrote:
>
> > 1. SCR are enabled
> > 2. Single Configuration for all table did not work well, but I will try
> it
> > again
> > 3. With Nagel I had 0.8ms avg, w/o - 0.4ms - I see the difference
> >
> >
> > On Tue, Jul 30, 2013 at 1:50 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> >> With Nagle's you'd see something around 40ms. You are not saying 0.8ms
> >> RTT is bad, right? Are you seeing ~40ms latencies?
> >>
> >> This thread has gotten confusing.
> >>
> >> I would try these:
> >> * one Configuration for all tables. Or even use a single
> >> HConnection/Threadpool and use the HTable(byte[], HConnection,
> >> ExecutorService) constructor
> >> * disable Nagle's: set both ipc.server.tcpnodelay and
> >> hbase.ipc.client.tcpnodelay to true in hbase-site.xml (both client *and*
> >> server)
> >> * increase hbase.client.ipc.pool.size in client's hbase-site.xml
> >> * enable short circuit reads (details depend on exact version of
> Hadoop).
> >> Google will help :)
> >>
> >> -- Lars
> >>
> >>
> >> ----- Original Message -----
> >> From: Vladimir Rodionov <[EMAIL PROTECTED]>
> >> To: [EMAIL PROTECTED]
> >> Cc:
> >> Sent: Tuesday, July 30, 2013 1:30 PM
> >> Subject: Re: HBase read perfomnance and HBase client
> >>
> >> This hbase.ipc.client.tcpnodelay (default - false) explains poor single
> >> thread performance and high latency ( 0.8ms in local network)?
> >>
> >>
> >> On Tue, Jul 30, 2013 at 1:22 PM, Vladimir Rodionov
> >> <[EMAIL PROTECTED]>wrote:
> >>
> >> > One more observation: One Configuration instance per HTable gives 50%
> >> > boost as compared to single Configuration object for all HTable's -
> from
> >> > 20K to 30K
> >> >
> >> >
> >> > On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov <
> >> [EMAIL PROTECTED]
> >> > > wrote:
> >> >
> >> >> This thread dump has been taken when client was sending 60 requests
> in
> >> >> parallel (at least, in theory). There are 50 server handler threads.
> >> >>
> >> >>
> >> >> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <
> >> >> [EMAIL PROTECTED]> wrote:
> >> >>
> >> >>> Sure, here it is:
> >> >>>
> >> >>> http://pastebin.com/8TjyrKRT
> >> >>>
> >> >>> epoll is not only to read/write HDFS but to connect/listen to
> clients
> >> as
> >> >>> well?