Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase cluster for serving real time site traffic


Copy link to this message
-
Hbase cluster for serving real time site traffic
Hi,

We are planning to experiment with a cluster for serving production traffic
using hbase for pinterest. We are starting off with a 10 region server + 1
master cluster on Amazon EMR version 0.92. I had some very naive questions
(primarily around points of failure):

1) It seems hbase starts only one zookeeper on the master node - which is
critical for operation - how many zookeepers should I use and can I run
those on the region servers ?
2) How many masters to use - does hbase support multiple masters (primary
and secondary) within the same cluster ? From my understanding, master
availability is not critical for operation.
3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a
single point of failure and we should really be running two name node(s) so
we can failover. Is it fine to run these on the region servers ?
4) Our current application involves long row/column - 24-32 bytes with 0-1
bytes of values. Should we be using a different key encoding than the
default encoding ? What advantages could it buy us ?

We are currently using amazon EMR for testing purposes which runs hbase
0.92. If it works well, we would like to configure our own cluster with
probably the latest version of hbase which appears to be 0.94 at the moment.

Thanks
Varun
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB