Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Does Leader Election Have a "Settling" period?


Copy link to this message
-
Does Leader Election Have a "Settling" period?
Mark Gius 2012-05-10, 20:35
I'm doing some testing around a Client being connected to a zookeeper
endpoint that goes away and I'm seeing what appears to be a "settling"
period that is causing some errors.

The test is as follows:

 1) Three zookeeper servers are started up on the same host, configured to
cluster with each other.
 2) A Client is created and attaches to Server 1 (using
deterministic_conn_order flag to force this)
 3) Shut down Server 1 (which is NOT the Leader)
 4) Servers 2 and 3 still have quorum.  Interruption of service should be
minimal.
 5) The Client _should_ reconnect immediately to Server 2 or 3.

The behavior I am seeing in practice is that after shutting down Server 1
quorum is lost and the Client takes on the order of 15-20 seconds to
re-establish a connection to the cluster.  I do not see this behavior on a
cluster that has existed for some time (say, 30-60 seconds).  I also do not
see this problem on a cluster whose tickTime has been decreased to 100ms
from the default of 2000ms.

Is there a settling period that occurs immediately after a Leader is
elected such that quorate changes during that time cause a full leader
election when one might not otherwise be necessary?  If so, where can I
find information about how this settling period behaves?

I have uploaded the logs for each of the three zookeeper servers here:
https://gist.github.com/2655709

Mark