Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Leader election failure


+
Jared Cantwell 2012-07-27, 04:40
+
Hanno Schlichting 2012-07-27, 05:11
+
Jared Cantwell 2012-07-27, 05:20
Copy link to this message
-
Re: Leader election failure
It looks like I had a firewall issue with the leader election port on one
of the nodes.  Sorry for the spam.

~Jared

On Thu, Jul 26, 2012 at 11:20 PM, Jared Cantwell
<[EMAIL PROTECTED]>wrote:

> We are currently testing out 3.5.0.  If the fix made it into 3.4.4, I
> assume that issue is also fixed in 3.5.0?
>
> ~Jared
>
>
> On Thu, Jul 26, 2012 at 11:11 PM, Hanno Schlichting <[EMAIL PROTECTED]>wrote:
>
>> What version of ZK are yoy using? There's a bug in 3.4.x with 5 node
>> clusters failing to agree on a leader. That's only solved in the yet
>> unreleased 3.4.4.
>>
>> Hanno
>>
>> On 27.07.2012, at 06:40, Jared Cantwell <[EMAIL PROTECTED]> wrote:
>>
>> > I have a 5 node cluster configured using dynamic zookeeper.  It has been
>> > through several reconfigurations, but at the moment I am simply trying
>> to
>> > start 3 of the nodes to get ZK accessible.  I have confirmed that the
>> myid
>> > files match the entries in the dynamic membership file for the 3 nodes
>> in
>> > question.  However, when I start up the three nodes I get the following
>> > error:
>> >
>> > 2012-07-26 22:26:01,037 [myid:8] - INFO  [QuorumPeer[myid=8]/
>> 10.10.5.27:2181
>> > :Leader@445] - LEADING - LEADER ELECTION TOOK - 13
>> > 2012-07-26 22:26:01,039 [myid:8] - INFO  [QuorumPeer[myid=8]/
>> 10.10.5.27:2181
>> > :FileSnap@83] - Reading snapshot /sf/data/zookeeper/
>> > 10.10.5.27/version-2/snapshot.3000001e3
>> > 2012-07-26 22:26:01,065 [myid:8] - INFO  [QuorumPeer[myid=8]/
>> 10.10.5.27:2181
>> > :FileTxnSnapLog@270] - Snapshotting: 0x3000001e3 to /sf/data/zookeeper/
>> > 10.10.5.27/version-2/snapshot.3000001e3
>> > 2012-07-26 22:26:10,837 [myid:8] - INFO
>> > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8
>> > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9
>> > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config
>> version)
>> > 2012-07-26 22:26:20,849 [myid:8] - INFO
>> > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8
>> > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9
>> > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config
>> version)
>> > 2012-07-26 22:26:21,083 [myid:8] - WARN  [QuorumPeer[myid=8]/
>> 10.10.5.27:2181
>> > :QuorumPeer@949] - Unexpected exception
>> > java.lang.InterruptedException: *Timeout while waiting for epoch from
>> quorum
>> > *
>> >        at
>> >
>> org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1207)
>> >        at
>> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:464)
>> >        at
>> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:946)
>> > 2012-07-26 22:26:21,083 [myid:8] - INFO  [QuorumPeer[myid=8]/
>> 10.10.5.27:2181
>> > :Leader@614] - Shutting down
>> > 2012-07-26 22:26:21,083 [myid:8] - INFO  [QuorumPeer[myid=8]/
>> 10.10.5.27:2181
>> > :Leader@620] - Shutdown called
>> > java.lang.Exception: shutdown Leader! reason: Forcing shutdown
>> >        at
>> > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:620)
>> >        at
>> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:952)
>> > 2012-07-26 22:26:21,084 [myid:8] - INFO  [QuorumPeer[myid=8]/
>> 10.10.5.27:2181
>> > :ZooKeeperServer@413] - shutting down
>> > 2012-07-26 22:26:21,084 [myid:8] - INFO
>> > [LearnerCnxAcceptor-0.0.0.0/0.0.0.0:2182:Leader$LearnerCnxAcceptor@407]
>> -
>> > exception while shutting down acceptor: java.net.SocketException: Socket
>> > closed
>> >
>> > I am not sure what to make of it or how to debug from here.  Any
>> pointers
>> > or suggestions on how to debug what might be wrong, or simply some usual
>> > causes of this error would be appreciated.
>> >
>> > Thanks!
>> > Jared
>>
>
>
+
David Nickerson 2012-07-27, 17:01