Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Questions while evaluating HBase

Copy link to this message
Questions while evaluating HBase
Eran Kutner 2010-03-04, 10:02
I'm evaluating Hbase as a NoSql DB for a large scale, interactive, web
service with very high uptime requirements, and have a few questions to the
   1. I assume you've seen this benchmark by Yahoo (
   http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf and
   http://www.brianfrankcooper.net/pubs/ycsb.pdf). They show three main
   problems: latency goes up quite significantly when doing more operations,
   operations/sec are capped at about half of the other tested platforms and
   adding new nodes interrupts the normal operation of the cluster for a while.
   Do you consider these results a problem and if so are there any plans to
   address them?
   2. While running our tests (most were done using 0.20.2) we've had a few
   incidents where a table went into "transition" without ever going out of it.
   We had to restart the cluster to release the stuck tables. Is this a common
   3. If I understand correctly then any major upgrade requires completely
   shutting down the cluster while doing the upgrade as well as deploying a new
   version of the application compiled with the new version client? Did I get
   it correctly? Is there any strategy for upgrading while the cluster is still
   4. This is more a bug report than a question but it seems that in 0.20.3
   the master server doesn't stop cleanly and has to be killed manually. Is
   someone else seeing it too?
   5. Are there any performance benchmarks for the Thrift gateway? Do you
   have an estimate of the performance penalty of using the gateway compared to
   using the native API?
   6. Right now, my biggest concern about HBase is its administration
   complexity and cost. If anyone can share their experience that would be a
   huge help. How many serves do you have in the cluster? How much ongoing
   effort does it take to administrate it? What uptime levels are you seeing
   (including upgrades)? Do you have any good strategy for running one cluster
   across two data centers, or replicating between two clusters in two
   different DCs? Did you have any serious problems/crashes/downtime with
Thanks a lot,
Eran Kutner