Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Distributed ZooKeeper cluster design


Copy link to this message
-
Re: Distributed ZooKeeper cluster design
Ted is of course right, but to speculate:

The idea you had with 3 in C, one in A and one in B isn't bad, given
some caveats.

With 3 in C, as long as they are all available, quorum should live in
C and you shouldn't have much slowdown from the remote servers in A
and B. However, if you point your A servers only to the A zookeeper,
you have a failover risk where your A servers will have no ZK if the
sever in region A goes down (same with B, of course). If you have a
lot of servers in the outer regions, this could be a risk. You are
also giving up any kind of load balancing for the A and B region ZKs,
which may not be important but is good to know.

Another thing to be aware of is that the A and B region ZKs will have
slower write response time due to the WAN cost, and they will tend to
lag behind the majority cluster a bit. This shouldn't cause
correctness issues but could impact client performance in those
regions.

Honestly, if you're doing a read-mostly workload in the A and B
regions, I doubt this is a bad design. It's pretty easy to test ZK
setups using Pat's zksmoketest utility, so you might try setting up
the sample cluster and running some of the smoketests on it.
(https://github.com/phunt/zk-smoketest/blob/master/zk-smoketest.py).
You could maybe also add observers in the outer regions to improve
client load balancing.

C

On Tue, Dec 13, 2011 at 9:05 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Which option is preferred really depends on your needs.
>
> Those needs are likely to vary in read/write ratios, resistance to network
> and so on.  You should also consider the possibility of observers in the
> remote locations.  You might also consider separate ZK clusters in each
> location with a special process to send mirrors of changes to these other
> locations.
>
> A complete and detailed answer really isn't possible without knowing the
> details of your application.  I generally don't like distributing a ZK
> cluster across distant hosts because it makes everything slower and more
> delicate, but I have heard of examples where that is exactly the right
> answer.
>
> On Tue, Dec 13, 2011 at 4:29 AM, Dima Gutzeit
> <[EMAIL PROTECTED]>wrote:
>
>> Dear list members,
>>
>> I have a question related to "suggested" way of working with ZooKeeper
>> cluster from different geographical locations.
>>
>> Lets assume a service span across several regions, A, B and C, while C is
>> defined as an element that the service can not live without and A and B are
>> not critical.
>>
>> Option one:
>>
>> Having one cluster of several ZooKeeper nodes in one location (C) and
>> accessing that from other locations A,B,C.
>>
>> Option two:
>>
>> Having ZooKeeper cluster span across all regions, i.e. 3 nodes in C, one in
>> A and one in B. This way the clients resides in A,B will access the local
>> ZooKeeper.
>>
>> Which option is preferred and which will work faster from client
>> perspective ?
>>
>> Thanks in advance.
>>
>> Regards,
>> Dima Gutzeit
>>