Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Distributed ZooKeeper cluster design

Copy link to this message
Re: Distributed ZooKeeper cluster design
Henry Robinson 2011-12-13, 18:37
On 13 December 2011 08:09, Dima Gutzeit <[EMAIL PROTECTED]> wrote:

> Ted and Camille,
> Thanks for a very details response.
> At the moment I have an option A implemented in production and what I see
> is that ZK client in A and B have a "slow" performance (even reads) and I
> can't really blame the network since it does not look like a real
> bottleneck.
> I wonder if doing option 2 will improve the ZK client performance/speed ...
> As for my use case, its around 50/50 reads and writes.

As for fallback, ofcourse in A and B I would want to define C as a backup,
> not sure how it can be done since as I understand if I supply several
> addresses in the connection string the client will use one, randomly.
> About Ted's suggestion to consider having several clusters and to have a
> special process to mirror, is it something available as part of ZooKeeper ?
> I also read about observers (is it available in 3.3.3 ?) and it seems to be
> a good option is my case, which brings me to the question of how to
> configure explicit fallback instead of random client selection ? If I want
> to tell ZK client in B to use the local B instance (observer) and if it
> fails then contact ANY server in the C (with a list of several).

I think observers are a good fit for you. If you need consistent writes
across A, B and C then you must send data across the WAN at some point.
Observers make it so that there's roughly half the number of WAN messages
in the critical path for a ZAB round initiated by an observer, because
observers don't vote. The general idea behind observers is that you can be
flexible about scaling out or placing client servers without slowing down
the voting process.

Although there's no way to avoid the randomisation of server selection in
the way you want (would be a nice patch to have), what I suggest is
deploying enough observers in A and B. Then if there are failures in A or B
you can configure the clients to failover to another local observer, and
never need to configure them to connect directly to C.  You should be able
to add many observers without any noticeable performance impact.


> Thanks in advance.
> Regards,
> Dima Gutzeit.
> On Tue, Dec 13, 2011 at 5:44 PM, Camille Fournier <[EMAIL PROTECTED]
> >wrote:
> > Ted is of course right, but to speculate:
> >
> > The idea you had with 3 in C, one in A and one in B isn't bad, given
> > some caveats.
> >
> > With 3 in C, as long as they are all available, quorum should live in
> > C and you shouldn't have much slowdown from the remote servers in A
> > and B. However, if you point your A servers only to the A zookeeper,
> > you have a failover risk where your A servers will have no ZK if the
> > sever in region A goes down (same with B, of course). If you have a
> > lot of servers in the outer regions, this could be a risk. You are
> > also giving up any kind of load balancing for the A and B region ZKs,
> > which may not be important but is good to know.
> >
> > Another thing to be aware of is that the A and B region ZKs will have
> > slower write response time due to the WAN cost, and they will tend to
> > lag behind the majority cluster a bit. This shouldn't cause
> > correctness issues but could impact client performance in those
> > regions.
> >
> > Honestly, if you're doing a read-mostly workload in the A and B
> > regions, I doubt this is a bad design. It's pretty easy to test ZK
> > setups using Pat's zksmoketest utility, so you might try setting up
> > the sample cluster and running some of the smoketests on it.
> > (https://github.com/phunt/zk-smoketest/blob/master/zk-smoketest.py).
> > You could maybe also add observers in the outer regions to improve
> > client load balancing.
> >
> > C
> >
> >
> >
> > On Tue, Dec 13, 2011 at 9:05 AM, Ted Dunning <[EMAIL PROTECTED]>
> > wrote:
> > > Which option is preferred really depends on your needs.
> > >
> > > Those needs are likely to vary in read/write ratios, resistance to

Henry Robinson
Software Engineer