Re: zookeeper cluster spanning datacenters
Mahadev Konar 2011-09-22, 20:53
On Sep 22, 2011, at 1:45 PM, Vishal Kher wrote:

> Hi Camille,
> This is  very interesting.
> Can you give more info on your setup?
> - Network connectivity (bandwidth and latency) that you have between the
> data centers? How much of the bandwidth is available for ZK?
> - What are the timeout (server and client session timeout) values that you
> use? How much latency are the applications willing to tolerate?
> We are thinking of running ZK across data centers as well and it will be
> great to see how others are resolving some of these problems.
> Thanks.
> -Vishal
> On Thu, Sep 22, 2011 at 11:03 AM, Fournier, Camille F. <
>> We spread our ZKs across 3 data centers and in fact, these data centers are
>> split across global regions (2 or 4 in one region, one in a remote region).
>> To keep throughput up (and note that the throughput you have to worry about
>> is only write throughput), we always ensure that the master is in one of the
>> "local" data centers.
>> If you have a very write-heavy and write time sensitive load, this might
>> affect your performance. It won't affect reads at all because reads are
>> serviced from the memory of the zk you connect to. For a mostly
>> read-intensive load, splitting across data centers is unlikely to cause you
>> problems.
>> There is one exception: Monitoring. Even across data centers in the same
>> region, we sometimes see zk dashboard unable to properly monitor the leader
>> of a heavily-utilized cluster. This is due to the way the 4lw connections
>> are managed, and something I'm trying to fix.
>> If you have the machines to test, I would recommend running zk-smoketest  (
>> https://github.com/phunt/zk-smoketest) on the proposed config.
>> Hi,
>> I would like to know the downsides of having a zookeeper cluster that spans
>> multiple datacenters. The requirement is a datacenter failure should not
>> bring down the zookeeper cluster. From my understanding it is not possible
>> to have a hot/cold cluster kind of setup possible. So we are thinking of
>> putting zk servers in 3 colos(1+1+1 or 2+2+3). One of the major drawback I
>> could think of is the throughput of the system affected by latency. The
>> system does not require high throughput and can accept some latency. How
>> much effect will the latency have on the throughput of the system? What are
>> the other downsides of spreading the cluster across datacenters?
