Zookeeper, mail # user - Re: zookeeper quorum falling apart with continuous leader election - 2014-02-12, 14:48
Solr & Elasticsearch trainings in New York & San Francisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
Re: zookeeper quorum falling apart with continuous leader election
It sounds like LE is completing periodically, but the servers are not being able to complete the synchronization step. We are also getting this connection refused exception when the follower is trying to connect. This is what I spotted for the follower:

2014-02-10 18:54:04,414 [myid:234] - INFO  [QuorumPeer[myid=234]/0:0:0:0:0:0:0:0:2181:Follower@65] - FOLLOWING - LEADER ELECTION TOOK - 1
2014-02-10 18:54:04,415 [myid:234] - WARN  [QuorumPeer[myid=234]/0:0:0:0:0:0:0:0:2181:Learner@239] - Unexpected exception, tries=0, connecting to
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:231)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:73)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:936)

and this:

2014-02-10 18:55:05,508 [myid:234] - INFO  [QuorumPeer[myid=234]/0:0:0:0:0:0:0:0:2181:Learner@442] - Learner received UPTODATE message
2014-02-10 18:55:05,508 [myid:234] - WARN  [QuorumPeer[myid=234]/0:0:0:0:0:0:0:0:2181:Follower@92] - Exception when following the leader
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(Unknown Source)
at java.net.SocketOutputStream.write(Unknown Source)
at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
at java.io.BufferedOutputStream.flush(Unknown Source)
at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:145)
at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:477)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:936)

On the leader side, we have this:

2014-02-10 19:48:03,705 [myid:235] - INFO  [LearnerHandler-/] - Synchronizing with Follower sid: 234 maxCommittedLog=0x4afe00000001 minCommittedLog=0x4afe00000001 peerLastZxid=0x4afd00000001
2014-02-10 19:48:03,705 [myid:235] - WARN  [LearnerHandler-/] - Unhandled proposal scenario
2014-02-10 19:48:03,705 [myid:235] - INFO  [LearnerHandler-/] - Sending SNAP
2014-02-10 19:48:03,705 [myid:235] - INFO  [LearnerHandler-/] - Sending snapshot last zxid of peer is 0x4afd00000001  zxid of leader is 0x4aff00000000sent zxid of db as 0x4afe00000001
2014-02-10 19:48:03,724 [myid:235] - WARN  [LearnerHandler-/] - Commiting zxid 0x4aff00000000 from / not first!

There are a couple of odd warnings there. Just to confirm, the node missing in the logs is the one with the bad disk, right?


On 12 Feb 2014, at 02:26, Deepak Jagtap <[EMAIL PROTECTED]> wrote:

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB