Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Zookeeper server recovery behaviors

Copy link to this message
Zookeeper server recovery behaviors
Martin Kou 2012-04-19, 18:29
Hi folks,

I've got a few questions about how Zookeeper servers behave in fail-recover

Assuming I have a 5-Zookeeper cluster, and one of the servers went dark and
came back, like 1 hour later.

   1. Is it correct to assume that clients won't be able to connect to the
   recovering server while it's still synchronizing with the leader, and thus
   any new client connections would automatically fall back to the other 4
   servers during synchronization?
   2. The documentation says a newly recovered server would have (initLimit
   * tickTime) seconds to synchronize with the leader when it's restarted. Is
   it correct to assume the time needed for synchronization is bounded by the
   amount of data managed by Zookeeper? Let's say in the worst case, someone
   set a very large snapCount to the cluster, there were a lot of
   transactions, but there aren't a lot of znodes - and thus there aren't a
   lot of data in each Zookeeper server but a very long transaction log. Would
   that bound still hold?
   3. I noticed from the documentation that a Zookeeper server falling >
   (syncLimit * tickTime) seconds from the leader will be dropped from quorum.
   I guess that's for detecting network partitions, right? If the partitioned
   server does report back to the leader later, how would it behave? (e.g.
   would it deny new client connections while it's synchronizing?)


Best Regards,
Martin Kou