Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Performance test results


Copy link to this message
-
Re: Performance test results
Eran Kutner 2011-04-21, 12:13
Hi J-D,
After stabilizing the configuration, with your great help, I was able
to go back to the the load tests. I tried using IRC, as you suggested,
to continue this discussion but because of the time difference (I'm
GMT+3) it is quite difficult to find a time when people are present
and I am available to run long tests, so I'll give the mailing list
one more try.

I tested again on a clean table using 100 insert threads each, using a
separate keyspace within the test table. Every row had just one column
with 128 bytes of data.
With one server and one region I got about 2300 inserts per second.
After manually splitting the region I got about 3600 inserts per
second (still on one machine). After a while the regions were balanced
and one was moved to another server, that got writes to around 4500
writes per second. Additional splits and moves to more servers didn't
improve this number and the write performance stabilized at ~4000
writes/sec per server. This seems pretty low, especially considering
other numbers I've seen around here.

Read performance is at around 1500 rows per second per server, which
seems extremely low to me, especially considering that all the working
set I was querying could fit in the servers memory. To make the test
interesting I limited my client to fetch only 1 row (always the same
one) from each keyspace, that yielded 10K reads per sec per server, so
I tried increasing the range again a read the same 10 rows, now the
performance dropped to 8500 reads/sec per server. Increasing the range
to 100 rows and the performance drops to around 3500 reads per second
per server.
Do you have any idea what could explain this behavior and how do I get
a decent number of reads from those servers?

-eran

On Thu, Mar 31, 2011 at 20:27, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
>
> Inline.
>
> J-D
>
> > I assume the block cache tunning key you talk about is
> > "hfile.block.cache.size", right? If it is only 20% by default than
> > what is the rest of the heap used for? Since there are no fancy
> > operations like joins and since I'm not using memory tables the only
> > thing I can think of is the memstore right? What is the recommended
> > value for the block cache?
>
> By default a max of 40% of the heap is reserved to MemStores, the rest
> is used to answer queries, do compactions, flushes, etc. It's very
> conservative, but people still find ways to OOME with very big cells
> sometimes :)
>
> >
> > As for the regions layout, right now the table in discussion has 264
> > regions more or less evenly distributed among the 5 region servers.
> > Let me know what other information I can provide.
>
> That's fine, but more important is the layout during the test. It can
> be tricky to benchmark a "real life workload" if you just did them
> import because it takes some time for the dust to settle. One example
> among many others, the balancer only runs only every few minutes so if
> you're doing a massive insert and then read, the load might only be on
> two machines.
>
> >
> > The key space is as follows: I launch n threads, each thread writes
> > keys that look like "streami_c" where "i" is the thread index (1-n)
> > and "c" is a counter that goes up from 1 until I stop the test. I
> > understand that each thread is only writing to the tail of its own
> > keyspace so only "n" region files can be used, however if that was the
> > limitation then adding more threads each with its own keyspace should
> > have increased the throughput.
>
> And can you tell by the start/stop keys that those threads do hit
> different regions. I understand you wouldn't have to worry about that
> too much in a real life scenario but since yours is artificial then
> who knows how it ended up.
>
> In order to speedup this discussion, feel free to drop by our IRC
> channel on freenode, very often we're able to find issues much faster
> using less time for everyone (and then report the findings here).
>
> J-D