Varun Sharma 2012-10-30, 17:41
Jean-Marc Spaggiari 2012-10-30, 17:53
Varun Sharma 2012-10-30, 18:03
Marcos Ortiz 2012-10-30, 20:20
Varun Sharma 2012-11-01, 08:01
Jeremy Carroll 2012-11-01, 16:31
Marcos Ortiz Valmaseda 2012-11-01, 11:17
for HA NameNode you may want to look at Hortonworks HDP 1.1 release. It supported on vSphere and on RedHat HA cluster.
HDP 1.1 based on Hadoop 1.0.3 and fully certified for production environments.
Do not forget, Hadoop 2.0 is still in alpha testing stage and a can not be recommended for production systems.
As of ZK nodes:
depending on the amount of ZK traffic, you may not need to put it to the separate nodes, it could easily coexist with DN .
However, it is better to split NN and HBmaster to separate nodes. Like NN on one node and HB Master and JT on other node.
Technical Support Engineer
office: +1 855 846 7866 ext 292
mobile: +1 650 430 1673
On Nov 1, 2012, at 4:17 AM, Marcos Ortiz Valmaseda wrote:
> Regards, Varun.
> 1- I think that you should take a look to the Cloudera Manager for CDH 4.1 to create a
> HA HDFS enviroment. Remember that the version 2.0.x is not ready for production yet. The stable version is Hadoop 1.0.4 with HBase 0.94.2
> 2- Yes, a recommended practice is to have a separate Zookeeper ensemble (three, five or seven are good numbers for the ensemble) from your NN, HB Master. For example:
> - 1 NN/HB Master, JT
> - 5 DN, HR Servers, TT
> - 3 nodes for the Zookeeper quorum.
> Best wishes.
> ----- Mensaje original -----
> De: Varun Sharma <[EMAIL PROTECTED]>
> Para: Marcos Ortiz <[EMAIL PROTECTED]>, kevin odell <[EMAIL PROTECTED]>
> CC: [EMAIL PROTECTED]
> Enviado: Thu, 01 Nov 2012 03:01:55 -0500 (CST)
> Asunto: Re: Hbase cluster for serving real time site traffic
> Thanks all for the helpful comments. I read up on HA and was wondering if
> there are good tools for setting up a HA HDFS + Hbase cluster on EC2
> quickly. From my reading, it appears that tools like Whirr still have
> issues with bringing up the secondary NN on a different machine etc. Also
> for availability, would Master-Slave replication or Master-Master
> replication be a substitute for having the secondary NN.
> For zookeeper, should the servers be running ZK only or is it fine to share
> with other services like the master ? Also, is it better to have a
> dedicated zookeeper cluster per hbase cluster ?
> On Tue, Oct 30, 2012 at 1:20 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:
>> Regards, Varun, answers in line
>> On 10/30/2012 01:03 PM, Varun Sharma wrote:
>> Thanks for the tips.
>> So, yes, secondary NameNode is probably more critical than the secondary
>> master - since the master is only responsible for metadata changes/region
>> splits/table creation etc and not for writes/reads.
>> Exactly, you have to create a good HA strategy for these nodes (Master
>> and Secondary Master)
>> Regarding the keys question - i meant that the (row + column) length is
>> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a cluster
>> running with all the data loaded into hbase but it all runs with default
>> There are many areas that you can optimize in a HBase cluster:
>> - Write operations
>> - Compactions and Split optimization
>> - Region Servers size
>> - Snappy compression
>> - Schema design
>> - Use of Block caching to Scan optimization
>> - Use of asynchronous clients for HBase operations (asynchbase for
>> The excellent Lars's book: "HBase: The Definitive Guide" has a completed
>> chapter for this tricky topic (Chapter 11)
>> Some additional resources:
>>  https://github.com/stumbleupon/asynchbase
>> Look at Slidehare all tagged presentations from the last HBaseCon, for
>> example the Benoit's talk about
>> "Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema Design":
Patrick Angeles 2012-11-01, 19:11
Patrick Angeles 2012-11-01, 19:20
Stack 2012-11-01, 18:59
Kevin Odell 2012-10-30, 19:15
Kevin Odell 2012-10-30, 19:16