Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Linear Scalability in HBase


Copy link to this message
-
RE: Linear Scalability in HBase
You can not saturate region server with one client (unless you probably use hbase-async) if all data is cached in RAM.
In our performance tests we have run 10 clients (on different hosts) with 30 threads each to max out 1 RS when all data
is in cache (block, page, etc).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]

________________________________________
From: Ramu M S [[EMAIL PROTECTED]]
Sent: Friday, October 25, 2013 9:35 AM
To: [EMAIL PROTECTED]
Subject: Re: Linear Scalability in HBase

Hi,

For me scalability is to achieve same throughput and latency with the
increase in number of clients.

In my case the data set increases with the number of clients. That's the
reason I vary both clients and region servers.

I'm trying to identify how the cluster should grow to handle data from more
clients so that the operations throughput and latency is under defined
limits.

Currently the limit is 15K OPS throughput and 1 ms latency.

To test, I have kept the data increase at around 15 million per server.

Each YCSB client actually runs 32 threads. So it is actually 15 million
more data for 32 more clients.

All machines are physical servers.

1) Read and write latency is around 1 ms in first whereas in second case
its little higher at 1.1 to 1.2 ms.

2) Keeping same number of clients as the first case, the latency reduced to
0.7 ms but throughput came down further to just 9K OPS

For the tests, I'm running both clients and Region servers on same machine.
But I tried in 8 Server scenario to run clients on different machines but
results were almost same as that of running clients on same machine.

Ganglia shows that system load is around 30% in both scenarios.

What I wanted to understand is how to grow the cluster to meet the needs of
both throughput and latency?

Regards,
Ramu

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB