Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> help on key design


+
Demian Berjman 2013-07-30, 20:37
Copy link to this message
-
Re: help on key design
If all your keys are grouped together, why don't you use a scan with start/end key specified? A sequential scan can theoretically be faster than MultiGet lookups (assuming your grouping is tight, you can also use filters with the scan to give better performance)

How much memory do you have for your region servers? Have you enabled block caching? Is your CPU spiking on your region servers?

If you are saturating the resources on your *hot* region server then yes having more region servers will help. If no, then something else is the bottleneck and you probably need to dig further
Regards,
Dhaval
________________________________
From: Demian Berjman <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Tuesday, 30 July 2013 4:37 PM
Subject: help on key design
Hi,

I would like to explain our use case of HBase, the row key design and the
problems we are having so anyone can give us a help:

The first thing we noticed is that our data set is too small compared to
other cases we read in the list and forums. We have a table containing 20
million keys splitted automatically by HBase in 4 regions and balanced in 3
region servers. We have designed our key to keep together the set of keys
requested by our app. That is, when we request a set of keys we expect them
to be grouped together to improve data locality and block cache efficiency.

The second thing we noticed, compared to other cases, is that we retrieve a
bunch keys per request (500 aprox). Thus, during our peaks (3k requests per
minute), we have a lot of requests going to a particular region servers and
asking a lot of keys. That results in poor response times (in the order of
seconds). Currently we are using multi gets.

We think an improvement would be to spread the keys (introducing a
randomized component on it) in more region servers, so each rs will have to
handle less keys and probably less requests. Doing that way the multi gets
will be spread over the region servers.

Our questions:

1. Is it correct this design of asking so many keys on each request? (if
you need high performance)
2. What about splitting in more region servers? It's a good idea? How we
could accomplish this? We thought in apply some hashing...

Thanks in advance! 
+
Ted Yu 2013-07-30, 22:45
+
Pablo Medina 2013-07-31, 14:24
+
Demian Berjman 2013-07-31, 15:12
+
Dhaval Shah 2013-07-31, 17:14
+
Demian Berjman 2013-07-31, 18:41
+
Dhaval Shah 2013-07-31, 18:59
+
Ted Yu 2013-07-31, 17:49
+
Michael Segel 2013-07-31, 18:41
+
Pablo Medina 2013-07-31, 18:57
+
Michael Segel 2013-07-31, 19:32
+
Pablo Medina 2013-07-31, 19:39
+
Pablo Medina 2013-07-31, 18:00
+
Ted Yu 2013-07-31, 20:08
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB