-ZooKeeper update collisions
Cameron McKenzie 2013-09-18, 02:19
Another quick question about some ZooKeeper functionality. Say I have 3 ZK
nodes running in a cluster. 2 (ZK1 and ZK2) at site 1, 1 (ZK3) at site 2.
In the case of a full site outage at site 1 (i.e. ZK1 and ZK2 die), ZK3
cannot form a quorum and thus ZK will stop accepting connections.
Is there any way to temporarily enable node ZK3 to become master while at
least one of ZK1 and ZK2 are resurrected and a quorum can be formed?
Specifically so that any updates made to it are transferred back to the
rest of the cluster once they reconnect.
I have tried running ZK3 as a standalone (i.e. commenting out the other
nodes from the configuration), but this doesn't seem to work as when ZK3
rejoins the cluster, its writes seem to be ignored. Indeed in some cases it
lead to other weirdness of a node having one value at ZK3 and another at
ZK1 and ZK2.
Ultimately it would seem that having a third site would minimise the
likelihood of this happening, but the issue still persists. How are people
solving this issue in production systems?