Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance test results

Copy link to this message
Re: Performance test results
I must say the more I play with it the more baffled I am with the
results. I ran the read test again today after not touching the
cluster for a couple of days and now I'm getting the same high read
numbers (10-11K reads/sec per server with some server reaching even
15K r/s) if I read 1, 10, 100 or even 1000 rows from every key space,
however 5000 rows yielded a read rate of only 3K rows per second, even
after a very long time. Just to be clear I'm always random reading a
single row in every request, the number of rows I'm talking about are
the ranges of rows within each key space that I'm randomly selecting
my keys from.

St.Ack - to answer your questions:

Writing from two machines increased the total number of writes per
second by about 10%, maybe less. Reads showed 15-20% increase when ran
from 2 machines.

I already had most of the performance tuning recommendations
implemented (garbage collection, using the new memory slabs feature,
using LZO) when I ran my previous test, the only config I didn't have
is "hbase.regionserver.handler.count", I changed it to 128, or 16
threads per core, which seems like a reasonable number and tried
inserting to the same key ranges as before, it didn't seem to have
made any difference in the total performance.

My keys are about 15 bytes long.

As for caching I can't find those cache hit ratio numbers in my logs,
do they require a special parameter to enable them? That said, my
calculations show that the entire data set I'm randomly reading should
easily fit in the servers memory. Each row has 15 bytes of key + 128
bytes of data + overhead - let's say 200 bytes. If I'm reading 5000
rows from each key space and have a total of 100 key spaces that's
100*5000*200=100000000B=100MB. This is spread across 5 servers with
16GB of RAM, out of which 12.5GB are allocated to the region servers.


On Tue, Apr 26, 2011 at 21:57, Stack <[EMAIL PROTECTED]> wrote:
> On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> > I tested again on a clean table using 100 insert threads each, using a
> > separate keyspace within the test table. Every row had just one column
> > with 128 bytes of data.
> >
> > With one server and one region I got about 2300 inserts per second.
> > After manually splitting the region I got about 3600 inserts per
> > second (still on one machine). After a while the regions were balanced
> > and one was moved to another server, that got writes to around 4500
> > writes per second. Additional splits and moves to more servers didn't
> > improve this number and the write performance stabilized at ~4000
> > writes/sec per server. This seems pretty low, especially considering
> > other numbers I've seen around here.
> >
> If you run your insert process on more than one box, do the numbers change?
> Nothing in http://hbase.apache.org/book.html#performance helps?
> What size your keys?
> > Read performance is at around 1500 rows per second per server, which
> > seems extremely low to me, especially considering that all the working
> > set I was querying could fit in the servers memory. To make the test
> > interesting I limited my client to fetch only 1 row (always the same
> > one) from each keyspace, that yielded 10K reads per sec per server, so
> > I tried increasing the range again a read the same 10 rows, now the
> > performance dropped to 8500 reads/sec per server. Increasing the range
> > to 100 rows and the performance drops to around 3500 reads per second
> > per server.
> This result is interesting.  The cache logs hit rate in the
> regionserver logs.  You seeing near 100% for 1 row, 10 row, and 100
> rows?
> St.Ack