Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Hbase cluster for serving real time site traffic


Copy link to this message
-
Re: Hbase cluster for serving real time site traffic
Varun Sharma 2012-11-01, 08:01
Thanks all for the helpful comments. I read up on HA and was wondering if
there are good tools for setting up a HA HDFS + Hbase cluster on EC2
quickly. From my reading, it appears that tools like Whirr still have
issues with bringing up the secondary NN on a different machine etc. Also
for availability, would Master-Slave replication or Master-Master
replication be a substitute for having the secondary NN.

For zookeeper, should the servers be running ZK only or is it fine to share
with other services like the master ? Also, is it better to have a
dedicated zookeeper cluster per hbase cluster ?

Thanks
Varun

On Tue, Oct 30, 2012 at 1:20 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:

>  Regards, Varun, answers in line
>
> On 10/30/2012 01:03 PM, Varun Sharma wrote:
>
> Thanks for the tips.
>
> So, yes, secondary NameNode is probably more critical than the secondary
> master - since the master is only responsible for metadata changes/region
> splits/table creation etc and not for writes/reads.
>
>  Exactly, you have to create a good HA strategy for these nodes (Master
> and Secondary Master)
>
>
>  Regarding the keys question - i meant that the (row + column) length is
> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a cluster
> running with all the data loaded into hbase but it all runs with default
> settings.
>
>  There are many areas that you can optimize in a HBase cluster:
> - Write operations
> - Compactions and Split optimization
> - Region Servers size
> - Snappy compression
> - Schema design
> - Use of Block caching to Scan optimization
> - Use of asynchronous clients for HBase operations (asynchbase for
> example[1])
> etc
>
> The excellent Lars's book: "HBase: The Definitive Guide" has a completed
> chapter for this tricky topic (Chapter 11)
>
> Some additional resources:
>
> [1] https://github.com/stumbleupon/asynchbase
> https://github.com/twitter/finagle
> http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html
> http://gbif.blogspot.com/2012/02/monitoring-hadoop-and-hbase.html
> http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/
>
> Look at Slidehare all tagged presentations from the last HBaseCon, for
> example the Benoit's talk about
> "Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema Design":
> http://www.slideshare.net/cloudera/tag/hbasecon-2012
>
> Best wishes
>
> Thanks
> Varun
>
> On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote:
>
>
>  My 2¢.
>
> 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum
> recommanded for production.
> 2) Yes, you have Master and SecondaryMaster. And it's also recommanded
> to have one of each. And the master is critical. If you are loosing
> it, you are loosing your cluster.
> 3) NameNode is hadoop, not hbase. You should follow hadoop
> recommandations. Like you have secondarymaster, you have
> secondarynamenode. So I think you should have as many
> secondarynamenode as you have secondarymaster (on the same machine?).
> 4) I'm not sure to understanding this question. Key are binary. Array
> of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot.
> This will only give you 2^32 different rows. You will have to
> pre-split them, or you will end with almost all of them on the same
> regionserver?
>
> JM
>
> 2012/10/30, Varun Sharma <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>:
>
>  Hi,
>
> We are planning to experiment with a cluster for serving production
>
>  traffic
>
>  using hbase for pinterest. We are starting off with a 10 region server +
>
>  1
>
>  master cluster on Amazon EMR version 0.92. I had some very naive
>
>  questions
>
>  (primarily around points of failure):
>
> 1) It seems hbase starts only one zookeeper on the master node - which is
> critical for operation - how many zookeepers should I use and can I run
> those on the region servers ?
> 2) How many masters to use - does hbase support multiple masters (primary
> and secondary) within the same cluster ? From my understanding, master