Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> HBase read perfomnance and HBase client


Copy link to this message
-
Re: HBase read perfomnance and HBase client
One more observation: One Configuration instance per HTable gives 50% boost
as compared to single Configuration object for all HTable's - from 20K to
30K
On Tue, Jul 30, 2013 at 1:17 PM, Vladimir Rodionov
<[EMAIL PROTECTED]>wrote:

> This thread dump has been taken when client was sending 60 requests in
> parallel (at least, in theory). There are 50 server handler threads.
>
>
> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov <[EMAIL PROTECTED]
> > wrote:
>
>> Sure, here it is:
>>
>> http://pastebin.com/8TjyrKRT
>>
>> epoll is not only to read/write HDFS but to connect/listen to clients as
>> well?
>>
>>
>> On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]
>> > wrote:
>>
>>> Can you show us what the thread dump looks like when the threads are
>>> BLOCKED? There aren't that many locks on the read path when reading
>>> out of the block cache, and epoll would only happen if you need to hit
>>> HDFS, which you're saying is not happening.
>>>
>>> J-D
>>>
>>> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
>>> <[EMAIL PROTECTED]> wrote:
>>> > I am hitting data in a block cache, of course. The data set is very
>>> small
>>> > to fit comfortably into block cache and all request are directed to the
>>> > same Region to guarantee single RS testing.
>>> >
>>> > To Ted:
>>> >
>>> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
>>> respect
>>> > to read performance?
>>> >
>>> >
>>> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
>>> [EMAIL PROTECTED]>wrote:
>>> >
>>> >> That's a tough one.
>>> >>
>>> >> One thing that comes to mind is socket reuse. It used to come up more
>>> >> more often but this is an issue that people hit when doing loads of
>>> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
>>> >> anything :)
>>> >>
>>> >> Also if you _just_ want to saturate something, be it CPU or network,
>>> >> wouldn't it be better to hit data only in the block cache? This way it
>>> >> has the lowest overhead?
>>> >>
>>> >> Last thing I wanted to mention is that yes, the client doesn't scale
>>> >> very well. I would suggest you give the asynchbase client a run.
>>> >>
>>> >> J-D
>>> >>
>>> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
>>> >> <[EMAIL PROTECTED]> wrote:
>>> >> > I have been doing quite extensive testing of different read
>>> scenarios:
>>> >> >
>>> >> > 1. blockcache disabled/enabled
>>> >> > 2. data is local/remote (no good hdfs locality)
>>> >> >
>>> >> > and it turned out that that I can not saturate 1 RS using one
>>> >> (comparable in CPU power and RAM) client host:
>>> >> >
>>> >> >  I am running client app with 60 read threads active (with
>>> multi-get)
>>> >> that is going to one particular RS and
>>> >> > this RS's load is 100 -150% (out of 3200% available) - it means that
>>> >> load is ~5%
>>> >> >
>>> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
>>> states
>>> >> (epoll)
>>> >> >
>>> >> > I attribute this  to the HBase client implementation which seems to
>>> be
>>> >> not scalable (I am going dig into client later on today).
>>> >> >
>>> >> > Some numbers: The maximum what I could get from Single get (60
>>> threads):
>>> >> 30K per sec. Multiget gives ~ 75K (60 threads)
>>> >> >
>>> >> > What are my options? I want to measure the limits and I do not want
>>> to
>>> >> run Cluster of clients against just ONE Region Server?
>>> >> >
>>> >> > RS config: 96GB RAM, 16(32) CPU
>>> >> > Client     : 48GB RAM   8 (16) CPU
>>> >> >
>>> >> > Best regards,
>>> >> > Vladimir Rodionov
>>> >> > Principal Platform Engineer
>>> >> > Carrier IQ, www.carrieriq.com
>>> >> > e-mail: [EMAIL PROTECTED]
>>> >> >
>>> >> >
>>> >> > Confidentiality Notice:  The information contained in this message,
>>> >> including any attachments hereto, may be confidential and is intended
>>> to be
>>> >> read only by the individual or entity to whom this message is
>>> addressed. If
>>> >> the reader of this message is not the intended recipient or an agent