Varun Sharma 2012-10-30, 17:41
Jean-Marc Spaggiari 2012-10-30, 17:53
Varun Sharma 2012-10-30, 18:03
Regards, Varun, answers in line
On 10/30/2012 01:03 PM, Varun Sharma wrote:
> Thanks for the tips.
> So, yes, secondary NameNode is probably more critical than the secondary
> master - since the master is only responsible for metadata changes/region
> splits/table creation etc and not for writes/reads.
Exactly, you have to create a good HA strategy for these nodes (Master
and Secondary Master)
> Regarding the keys question - i meant that the (row + column) length is
> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a cluster
> running with all the data loaded into hbase but it all runs with default
There are many areas that you can optimize in a HBase cluster:
- Write operations
- Compactions and Split optimization
- Region Servers size
- Snappy compression
- Schema design
- Use of Block caching to Scan optimization
- Use of asynchronous clients for HBase operations (asynchbase for
The excellent Lars's book: "HBase: The Definitive Guide" has a completed
chapter for this tricky topic (Chapter 11)
Some additional resources:
Look at Slidehare all tagged presentations from the last HBaseCon, for
example the Benoit's talk about
"Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema Design":
> On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>> My 2ï¿½.
>> 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum
>> recommanded for production.
>> 2) Yes, you have Master and SecondaryMaster. And it's also recommanded
>> to have one of each. And the master is critical. If you are loosing
>> it, you are loosing your cluster.
>> 3) NameNode is hadoop, not hbase. You should follow hadoop
>> recommandations. Like you have secondarymaster, you have
>> secondarynamenode. So I think you should have as many
>> secondarynamenode as you have secondarymaster (on the same machine?).
>> 4) I'm not sure to understanding this question. Key are binary. Array
>> of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot.
>> This will only give you 2^32 different rows. You will have to
>> pre-split them, or you will end with almost all of them on the same
>> 2012/10/30, Varun Sharma <[EMAIL PROTECTED]>:
>>> We are planning to experiment with a cluster for serving production
>>> using hbase for pinterest. We are starting off with a 10 region server +
>>> master cluster on Amazon EMR version 0.92. I had some very naive
>>> (primarily around points of failure):
>>> 1) It seems hbase starts only one zookeeper on the master node - which is
>>> critical for operation - how many zookeepers should I use and can I run
>>> those on the region servers ?
>>> 2) How many masters to use - does hbase support multiple masters (primary
>>> and secondary) within the same cluster ? From my understanding, master
>>> availability is not critical for operation.
>>> 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a
>>> single point of failure and we should really be running two name node(s)
>>> we can failover. Is it fine to run these on the region servers ?
>>> 4) Our current application involves long row/column - 24-32 bytes with
>>> bytes of values. Should we be using a different key encoding than the
>>> default encoding ? What advantages could it buy us ?
>>> We are currently using amazon EMR for testing purposes which runs hbase
>>> 0.92. If it works well, we would like to configure our own cluster with
>>> probably the latest version of hbase which appears to be 0.94 at the
Marcos Luis Ortï¿½z Valmaseda
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
Varun Sharma 2012-11-01, 08:01
Jeremy Carroll 2012-11-01, 16:31
Marcos Ortiz Valmaseda 2012-11-01, 11:17
Leonid Fedotov 2012-11-01, 17:09
Patrick Angeles 2012-11-01, 19:11
Patrick Angeles 2012-11-01, 19:20
Stack 2012-11-01, 18:59
Kevin Odell 2012-10-30, 19:15
Kevin Odell 2012-10-30, 19:16