Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Does Leader Election Have a "Settling" period?


+
Mark Gius 2012-05-10, 20:35
+
Flavio Junqueira 2012-05-11, 11:25
Copy link to this message
-
Re: Does Leader Election Have a "Settling" period?
Hmm... so then it looks like the problem was that I needed to give 3 a
little more time to join the quorum before shooting 1 so as to maintain
quorum throughout the test.  I'll give that a shot.   Thanks!

Mark

On Fri, May 11, 2012 at 4:25 AM, Flavio Junqueira <[EMAIL PROTECTED]> wrote:

> Hi Mark, From your logs, server 2 was leading and was followed only by
> server 1:
>
> 2012-05-09 01:07:25,523 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@390] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 1
>        at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:390)
>        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:367)
>        at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:658)
>
> Consequently when you shut down 1 the ensemble lost quorum. The sequence
> of notifications made 3 think that it was leading, but it didn't become
> established as a leader because it didn't have a quorum supporting.
> Eventually 3 gives up and starts following 2:
>
> 2012-05-09 01:08:08,961 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967297
> (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
> 2012-05-09 01:08:09,163 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2184:QuorumPeer@643] - FOLLOWING
>
> and 2 leading:
>
> 2012-05-09 01:08:08,959 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967297
> (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state)
> 2012-05-09 01:08:08,959 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 0 (n.zxid),
> 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
> 2012-05-09 01:08:08,961 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967297
> (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
> 2012-05-09 01:08:09,162 - INFO
>  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:QuorumPeer@655] - LEADING
>
> I'm not sure why it took so much time for the notifications to propagate,
> though.
>
> -Flavio
>
> On May 10, 2012, at 10:35 PM, Mark Gius wrote:
>
> > I'm doing some testing around a Client being connected to a zookeeper
> > endpoint that goes away and I'm seeing what appears to be a "settling"
> > period that is causing some errors.
> >
> > The test is as follows:
> >
> > 1) Three zookeeper servers are started up on the same host, configured to
> > cluster with each other.
> > 2) A Client is created and attaches to Server 1 (using
> > deterministic_conn_order flag to force this)
> > 3) Shut down Server 1 (which is NOT the Leader)
> > 4) Servers 2 and 3 still have quorum.  Interruption of service should be
> > minimal.
> > 5) The Client _should_ reconnect immediately to Server 2 or 3.
> >
> > The behavior I am seeing in practice is that after shutting down Server 1
> > quorum is lost and the Client takes on the order of 15-20 seconds to
> > re-establish a connection to the cluster.  I do not see this behavior on
> a
> > cluster that has existed for some time (say, 30-60 seconds).  I also do
> not
> > see this problem on a cluster whose tickTime has been decreased to 100ms
> > from the default of 2000ms.
> >
> > Is there a settling period that occurs immediately after a Leader is
> > elected such that quorate changes during that time cause a full leader
> > election when one might not otherwise be necessary?  If so, where can I
> > find information about how this settling period behaves?
> >
> > I have uploaded the logs for each of the three zookeeper servers here:
> > https://gist.github.com/2655709
> >
> > Mark
>
>
>
+
Martin Kou 2012-05-10, 23:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB