|
|
Hari Sreekumar 2011-03-26, 17:01
Hi guys,
On what factors does HBase read latency primarily depend? What would be the approx theoretical limit for read latency in v0.90.1 on a cluster of 7 nodes (16 core/16 GB RAM on 5 machines and 36 GB on the other two)? I have an application where I generate around 1000 rows/s to be input into HBase. Then I have to read this data and process it at regular intervals. Write speed is not a problem as the cluster seems to be able to write at the reqd. rate. But while processing this data also, I would need a read speed of at least 1000 rows/s since I need to keep the processing speed at least equal to the data generation speed. So far, I am getting around 200-300 rows/s only it seems. I have LZO compression on the tables and I haven't tried in-memory yet as my RAM usage is too high already while running jobs. Is it possible to achieve this read speed, and what can I do to improve it? How far can adding more nodes/more RAM help? Please let me know if the scope is too huge to answer this question and if you need more details.
Thanks, Hari
Amandeep Khurana 2011-03-26, 19:03
What is your typical row size? How many column families? How many columns in each family?
On Mar 26, 2011, at 10:01 AM, Hari Sreekumar wrote:
> Hi guys, > > On what factors does HBase read latency primarily depend? What would be the > approx theoretical limit for read latency in v0.90.1 on a cluster of 7 nodes > (16 core/16 GB RAM on 5 machines and 36 GB on the other two)? I have an > application where I generate around 1000 rows/s to be input into HBase.
> Then > I have to read this data and process it at regular intervals. Write speed is > not a problem as the cluster seems to be able to write at the reqd. rate. > But while processing this data also, I would need a read speed of at least > 1000 rows/s since I need to keep the processing speed at least equal to the > data generation speed. So far, I am getting around 200-300 rows/s only it > seems. I have LZO compression on the tables and I haven't tried in-memory > yet as my RAM usage is too high already while running jobs. Is it possible > to achieve this read speed, and what can I do to improve it? How far can > adding more nodes/more RAM help? Please let me know if the scope is too huge > to answer this question and if you need more details. > > Thanks, > Hari
Ted Dunning 2011-03-26, 22:56
This sounds like you are being limited by sequentially reading records in a single thread with multiple queries.
Can you say more about what kind of read your doing and about the structure of the program initiating the reads?
On Sat, Mar 26, 2011 at 10:01 AM, Hari Sreekumar <[EMAIL PROTECTED]>wrote:
> So far, I am getting around 200-300 rows/s only it > seems. >
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext