Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Leader election failure


Copy link to this message
-
Re: Leader election failure
We are currently testing out 3.5.0.  If the fix made it into 3.4.4, I
assume that issue is also fixed in 3.5.0?

~Jared

On Thu, Jul 26, 2012 at 11:11 PM, Hanno Schlichting <[EMAIL PROTECTED]>wrote:

> What version of ZK are yoy using? There's a bug in 3.4.x with 5 node
> clusters failing to agree on a leader. That's only solved in the yet
> unreleased 3.4.4.
>
> Hanno
>
> On 27.07.2012, at 06:40, Jared Cantwell <[EMAIL PROTECTED]> wrote:
>
> > I have a 5 node cluster configured using dynamic zookeeper.  It has been
> > through several reconfigurations, but at the moment I am simply trying to
> > start 3 of the nodes to get ZK accessible.  I have confirmed that the
> myid
> > files match the entries in the dynamic membership file for the 3 nodes in
> > question.  However, when I start up the three nodes I get the following
> > error:
> >
> > 2012-07-26 22:26:01,037 [myid:8] - INFO  [QuorumPeer[myid=8]/
> 10.10.5.27:2181
> > :Leader@445] - LEADING - LEADER ELECTION TOOK - 13
> > 2012-07-26 22:26:01,039 [myid:8] - INFO  [QuorumPeer[myid=8]/
> 10.10.5.27:2181
> > :FileSnap@83] - Reading snapshot /sf/data/zookeeper/
> > 10.10.5.27/version-2/snapshot.3000001e3
> > 2012-07-26 22:26:01,065 [myid:8] - INFO  [QuorumPeer[myid=8]/
> 10.10.5.27:2181
> > :FileTxnSnapLog@270] - Snapshotting: 0x3000001e3 to /sf/data/zookeeper/
> > 10.10.5.27/version-2/snapshot.3000001e3
> > 2012-07-26 22:26:10,837 [myid:8] - INFO
> > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8
> > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9
> > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config
> version)
> > 2012-07-26 22:26:20,849 [myid:8] - INFO
> > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8
> > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9
> > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config
> version)
> > 2012-07-26 22:26:21,083 [myid:8] - WARN  [QuorumPeer[myid=8]/
> 10.10.5.27:2181
> > :QuorumPeer@949] - Unexpected exception
> > java.lang.InterruptedException: *Timeout while waiting for epoch from
> quorum
> > *
> >        at
> >
> org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1207)
> >        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:464)
> >        at
> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:946)
> > 2012-07-26 22:26:21,083 [myid:8] - INFO  [QuorumPeer[myid=8]/
> 10.10.5.27:2181
> > :Leader@614] - Shutting down
> > 2012-07-26 22:26:21,083 [myid:8] - INFO  [QuorumPeer[myid=8]/
> 10.10.5.27:2181
> > :Leader@620] - Shutdown called
> > java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> >        at
> > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:620)
> >        at
> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:952)
> > 2012-07-26 22:26:21,084 [myid:8] - INFO  [QuorumPeer[myid=8]/
> 10.10.5.27:2181
> > :ZooKeeperServer@413] - shutting down
> > 2012-07-26 22:26:21,084 [myid:8] - INFO
> > [LearnerCnxAcceptor-0.0.0.0/0.0.0.0:2182:Leader$LearnerCnxAcceptor@407]
> -
> > exception while shutting down acceptor: java.net.SocketException: Socket
> > closed
> >
> > I am not sure what to make of it or how to debug from here.  Any pointers
> > or suggestions on how to debug what might be wrong, or simply some usual
> > causes of this error would be appreciated.
> >
> > Thanks!
> > Jared
>