Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Hbase cluster for serving real time site traffic

Copy link to this message
Re: Hbase cluster for serving real time site traffic
Marcos Ortiz 2012-10-30, 20:20
Regards, Varun, answers in line
On 10/30/2012 01:03 PM, Varun Sharma wrote:
> Thanks for the tips.
> So, yes, secondary NameNode is probably more critical than the secondary
> master - since the master is only responsible for metadata changes/region
> splits/table creation etc and not for writes/reads.
Exactly, you have to create a good HA strategy for these nodes (Master
and Secondary Master)

> Regarding the keys question - i meant that the (row + column) length is
> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a cluster
> running with all the data loaded into hbase but it all runs with default
> settings.
There are many areas that you can optimize in a HBase cluster:
- Write operations
- Compactions and Split optimization
- Region Servers size
- Snappy compression
- Schema design
- Use of Block caching to Scan optimization
- Use of asynchronous clients for HBase operations (asynchbase for

The excellent Lars's book: "HBase: The Definitive Guide" has a completed
chapter for this tricky topic (Chapter 11)

Some additional resources:

[1] https://github.com/stumbleupon/asynchbase

Look at Slidehare all tagged presentations from the last HBaseCon, for
example the Benoit's talk about
"Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema Design":

Best wishes
> Thanks
> Varun
> On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari <
>> My 2�.
>> 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum
>> recommanded for production.
>> 2) Yes, you have Master and SecondaryMaster. And it's also recommanded
>> to have one of each. And the master is critical. If you are loosing
>> it, you are loosing your cluster.
>> 3) NameNode is hadoop, not hbase. You should follow hadoop
>> recommandations. Like you have secondarymaster, you have
>> secondarynamenode. So I think you should have as many
>> secondarynamenode as you have secondarymaster (on the same machine?).
>> 4) I'm not sure to understanding this question. Key are binary. Array
>> of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot.
>> This will only give you 2^32 different rows. You will have to
>> pre-split them, or you will end with almost all of them on the same
>> regionserver?
>> JM
>> 2012/10/30, Varun Sharma <[EMAIL PROTECTED]>:
>>> Hi,
>>> We are planning to experiment with a cluster for serving production
>> traffic
>>> using hbase for pinterest. We are starting off with a 10 region server +
>> 1
>>> master cluster on Amazon EMR version 0.92. I had some very naive
>> questions
>>> (primarily around points of failure):
>>> 1) It seems hbase starts only one zookeeper on the master node - which is
>>> critical for operation - how many zookeepers should I use and can I run
>>> those on the region servers ?
>>> 2) How many masters to use - does hbase support multiple masters (primary
>>> and secondary) within the same cluster ? From my understanding, master
>>> availability is not critical for operation.
>>> 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a
>>> single point of failure and we should really be running two name node(s)
>> so
>>> we can failover. Is it fine to run these on the region servers ?
>>> 4) Our current application involves long row/column - 24-32 bytes with
>> 0-1
>>> bytes of values. Should we be using a different key encoding than the
>>> default encoding ? What advantages could it buy us ?
>>> We are currently using amazon EMR for testing purposes which runs hbase
>>> 0.92. If it works well, we would like to configure our own cluster with
>>> probably the latest version of hbase which appears to be 0.94 at the
Marcos Luis Ort�z Valmaseda
about.me/marcosortiz <http://about.me/marcosortiz>
@marcosluis2186 <http://twitter.com/marcosluis2186>