Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Hbase cluster for serving real time site traffic


+
Varun Sharma 2012-10-30, 17:41
+
Jean-Marc Spaggiari 2012-10-30, 17:53
+
Varun Sharma 2012-10-30, 18:03
Copy link to this message
-
Re: Hbase cluster for serving real time site traffic
Marcos Ortiz 2012-10-30, 20:20
Regards, Varun, answers in line
On 10/30/2012 01:03 PM, Varun Sharma wrote:
> Thanks for the tips.
>
> So, yes, secondary NameNode is probably more critical than the secondary
> master - since the master is only responsible for metadata changes/region
> splits/table creation etc and not for writes/reads.
Exactly, you have to create a good HA strategy for these nodes (Master
and Secondary Master)

>
> Regarding the keys question - i meant that the (row + column) length is
> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a cluster
> running with all the data loaded into hbase but it all runs with default
> settings.
There are many areas that you can optimize in a HBase cluster:
- Write operations
- Compactions and Split optimization
- Region Servers size
- Snappy compression
- Schema design
- Use of Block caching to Scan optimization
- Use of asynchronous clients for HBase operations (asynchbase for
example[1])
etc

The excellent Lars's book: "HBase: The Definitive Guide" has a completed
chapter for this tricky topic (Chapter 11)

Some additional resources:

[1] https://github.com/stumbleupon/asynchbase
https://github.com/twitter/finagle
http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html
http://gbif.blogspot.com/2012/02/monitoring-hadoop-and-hbase.html
http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/

Look at Slidehare all tagged presentations from the last HBaseCon, for
example the Benoit's talk about
"Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema Design":
http://www.slideshare.net/cloudera/tag/hbasecon-2012

Best wishes
>
> Thanks
> Varun
>
> On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> My 2�.
>>
>> 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum
>> recommanded for production.
>> 2) Yes, you have Master and SecondaryMaster. And it's also recommanded
>> to have one of each. And the master is critical. If you are loosing
>> it, you are loosing your cluster.
>> 3) NameNode is hadoop, not hbase. You should follow hadoop
>> recommandations. Like you have secondarymaster, you have
>> secondarynamenode. So I think you should have as many
>> secondarynamenode as you have secondarymaster (on the same machine?).
>> 4) I'm not sure to understanding this question. Key are binary. Array
>> of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot.
>> This will only give you 2^32 different rows. You will have to
>> pre-split them, or you will end with almost all of them on the same
>> regionserver?
>>
>> JM
>>
>> 2012/10/30, Varun Sharma <[EMAIL PROTECTED]>:
>>> Hi,
>>>
>>> We are planning to experiment with a cluster for serving production
>> traffic
>>> using hbase for pinterest. We are starting off with a 10 region server +
>> 1
>>> master cluster on Amazon EMR version 0.92. I had some very naive
>> questions
>>> (primarily around points of failure):
>>>
>>> 1) It seems hbase starts only one zookeeper on the master node - which is
>>> critical for operation - how many zookeepers should I use and can I run
>>> those on the region servers ?
>>> 2) How many masters to use - does hbase support multiple masters (primary
>>> and secondary) within the same cluster ? From my understanding, master
>>> availability is not critical for operation.
>>> 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a
>>> single point of failure and we should really be running two name node(s)
>> so
>>> we can failover. Is it fine to run these on the region servers ?
>>> 4) Our current application involves long row/column - 24-32 bytes with
>> 0-1
>>> bytes of values. Should we be using a different key encoding than the
>>> default encoding ? What advantages could it buy us ?
>>>
>>> We are currently using amazon EMR for testing purposes which runs hbase
>>> 0.92. If it works well, we would like to configure our own cluster with
>>> probably the latest version of hbase which appears to be 0.94 at the
Marcos Luis Ort�z Valmaseda
about.me/marcosortiz <http://about.me/marcosortiz>
@marcosluis2186 <http://twitter.com/marcosluis2186>

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci
+
Varun Sharma 2012-11-01, 08:01
+
Jeremy Carroll 2012-11-01, 16:31
+
Marcos Ortiz Valmaseda 2012-11-01, 11:17
+
Leonid Fedotov 2012-11-01, 17:09
+
Patrick Angeles 2012-11-01, 19:11
+
Patrick Angeles 2012-11-01, 19:20
+
Stack 2012-11-01, 18:59
+
Kevin Odell 2012-10-30, 19:15
+
Kevin Odell 2012-10-30, 19:16