Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - HBase read perfomnance and HBase client


Copy link to this message
-
Re: HBase read perfomnance and HBase client
Vladimir Rodionov 2013-07-30, 20:52
Exactly, but this thread dump is from RS under load nevertheless (you can
see that one thread is in JAVA and reading data from socket)
On Tue, Jul 30, 2013 at 1:35 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> FWIW nothing is happening in that thread dump.
>
> J-D
>
> On Tue, Jul 30, 2013 at 1:15 PM, Vladimir Rodionov
> <[EMAIL PROTECTED]> wrote:
> > Sure, here it is:
> >
> > http://pastebin.com/8TjyrKRT
> >
> > epoll is not only to read/write HDFS but to connect/listen to clients as
> > well?
> >
> >
> > On Tue, Jul 30, 2013 at 12:31 PM, Jean-Daniel Cryans <
> [EMAIL PROTECTED]>wrote:
> >
> >> Can you show us what the thread dump looks like when the threads are
> >> BLOCKED? There aren't that many locks on the read path when reading
> >> out of the block cache, and epoll would only happen if you need to hit
> >> HDFS, which you're saying is not happening.
> >>
> >> J-D
> >>
> >> On Tue, Jul 30, 2013 at 12:16 PM, Vladimir Rodionov
> >> <[EMAIL PROTECTED]> wrote:
> >> > I am hitting data in a block cache, of course. The data set is very
> small
> >> > to fit comfortably into block cache and all request are directed to
> the
> >> > same Region to guarantee single RS testing.
> >> >
> >> > To Ted:
> >> >
> >> > Yes, its CDH 4.3 . What the difference between 94.10 and 94.6 with
> >> respect
> >> > to read performance?
> >> >
> >> >
> >> > On Tue, Jul 30, 2013 at 12:06 PM, Jean-Daniel Cryans <
> >> [EMAIL PROTECTED]>wrote:
> >> >
> >> >> That's a tough one.
> >> >>
> >> >> One thing that comes to mind is socket reuse. It used to come up more
> >> >> more often but this is an issue that people hit when doing loads of
> >> >> random reads. Try enabling tcp_tw_recycle but I'm not guaranteeing
> >> >> anything :)
> >> >>
> >> >> Also if you _just_ want to saturate something, be it CPU or network,
> >> >> wouldn't it be better to hit data only in the block cache? This way
> it
> >> >> has the lowest overhead?
> >> >>
> >> >> Last thing I wanted to mention is that yes, the client doesn't scale
> >> >> very well. I would suggest you give the asynchbase client a run.
> >> >>
> >> >> J-D
> >> >>
> >> >> On Tue, Jul 30, 2013 at 11:23 AM, Vladimir Rodionov
> >> >> <[EMAIL PROTECTED]> wrote:
> >> >> > I have been doing quite extensive testing of different read
> scenarios:
> >> >> >
> >> >> > 1. blockcache disabled/enabled
> >> >> > 2. data is local/remote (no good hdfs locality)
> >> >> >
> >> >> > and it turned out that that I can not saturate 1 RS using one
> >> >> (comparable in CPU power and RAM) client host:
> >> >> >
> >> >> >  I am running client app with 60 read threads active (with
> multi-get)
> >> >> that is going to one particular RS and
> >> >> > this RS's load is 100 -150% (out of 3200% available) - it means
> that
> >> >> load is ~5%
> >> >> >
> >> >> > All threads in RS are either in BLOCKED (wait) or in IN_NATIVE
> states
> >> >> (epoll)
> >> >> >
> >> >> > I attribute this  to the HBase client implementation which seems
> to be
> >> >> not scalable (I am going dig into client later on today).
> >> >> >
> >> >> > Some numbers: The maximum what I could get from Single get (60
> >> threads):
> >> >> 30K per sec. Multiget gives ~ 75K (60 threads)
> >> >> >
> >> >> > What are my options? I want to measure the limits and I do not
> want to
> >> >> run Cluster of clients against just ONE Region Server?
> >> >> >
> >> >> > RS config: 96GB RAM, 16(32) CPU
> >> >> > Client     : 48GB RAM   8 (16) CPU
> >> >> >
> >> >> > Best regards,
> >> >> > Vladimir Rodionov
> >> >> > Principal Platform Engineer
> >> >> > Carrier IQ, www.carrieriq.com
> >> >> > e-mail: [EMAIL PROTECTED]
> >> >> >
> >> >> >
> >> >> > Confidentiality Notice:  The information contained in this message,
> >> >> including any attachments hereto, may be confidential and is intended
> >> to be
> >> >> read only by the individual or entity to whom this message is
> >> addressed. If
> >> >> the reader of this message is not the intended recipient or an agent