Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance test results


Copy link to this message
-
Re: Performance test results
Running the client on more than one server doesn't change the overall
results, the total number of requests just get distributed across the
two clients.
I tried two things, inserting rows with one column each and inserting
rows with 100 columns each, in both cases the data was 1K per column,
so it does add up to 100K per row for the second test.
I guess my config is more or less standard, I have two masters and a 3
server ZK ensemble, I have replication enabled, but not for the table
I'm using for testing, and the other tables are not getting any
requests during this test. The only non standard thing I have is the
new memory slab feature and the GC configuration as recommended in the
recent Cloudera blog posts.
I've attached the jstack dump from one of the RS, it seems a lot of
threads are either parked or in "epollWait" state.

Thanks for looking into it.

-eran

On Mon, Mar 28, 2011 at 17:38, Stack <[EMAIL PROTECTED]> wrote:
>
> On Mon, Mar 28, 2011 at 3:16 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> > I started with a basic insert operation. Inserting rows with one
> > column with 1KB of data each.
> > Initially, when the table was empty I was getting around 300 inserts
> > per second with 50 writing threads. Then, when the region split and a
> > second server was added the rate suddenly jumped to 3000 inserts/sec
> > per server, so ~6000 for the two servers. Over time as more servers
> > were added the rate actually went down, and stabilized on around 2000
> > inserts/sec per server.
> >
>
> What if you ran your client on more than one server?
>
> An insert is a single 1k cell?
>
> Tell us more about your configs.  Are you using defaults?  If you
> watch the logs during your upload, do you see much blocking?
>
> > I also conducted a random column read test, where I read different
> > number of columns from randomly selected rows. First I tested reading
> > only one specific column (the first in each row). It started at around
> > 60r/s  per server and gradually (I assume as more data was loaded into
> > the cache)  increased to ~800 r/s per server.
>
> You can check the regionserver log.  It emits a cache stats log line
> every so often.  Check cache hit rate percentage.
>
> > When reading 5 random
> > columns from each row the rate dropped to around 400 rows/sec and when
> > fetching full rows (each with 100 columns) the rate remained about the
> > same, at 400 rows/sec per server.
> >
>
> 100 columns in a row is 100k, right?
>
> > I'm not sure exactly what should I expect but I was hoping for much
> > higher numbers. I read somewhere that for small data it is reasonable
> > to expect 10K inserts per core per second. I know 1KB isn't small but
> > these are 8 core machines and they are doing about 2K inserts. Also
> > the read rate is very low considering all the data should fit in RAM.
> > The interesting thing is that there doesn't seem to be any resource
> > bottleneck. IO utilization on the servers is negligible and CPU is
> > around 40-50% utilization. The client generating the load is not
> > loaded either (around 5% CPU utilization). Client network was at 30%
> > utilization when reading full rows. So the only reason for flat-lining
> > is some sort of lock contention. Does this make sense?
> >
>
> This could be the case.  If you jstack during the reads, what are you
> seeing?  Are servers locked up waiting to pass a synchronization point
> or waiting on a lock?
>
> St.Ack