I have a 3 node ensemble in production and after restarting one node it can
no longer connect to the ensemble. I am getting this error below:
2018-01-10 00:49:32,492 [myid:2] - INFO
[WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
identifier, so dropping the connection: (3, 2)
2018-01-10 00:50:20,342 [myid:2] - WARN
[RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id 1,
my id = 2, error =
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:197)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.net.SocketInputStream.read(SocketInputStream.java:211)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
2018-01-10 00:50:20,343 [myid:2] - WARN
[RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
2018-01-10 00:50:20,343 [myid:2] - WARN
[SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting
for message on queue
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
2018-01-10 00:50:20,343 [myid:2] - WARN
[SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving thread
2018-01-10 00:50:32,491 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2018-01-10 00:50:32,493 [myid:2] - INFO
[WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
(n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
2018-01-10 00:50:32,495 [myid:2] - INFO
[WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
identifier, so dropping the connection: (3, 2)
2018-01-10 00:51:32,494 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2018-01-10 00:51:32,494 [myid:2] - INFO
[WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
(n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
2018-01-10 00:51:32,496 [myid:2] - INFO
[WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
identifier, so dropping the connection: (3, 2)
2018-01-10 00:52:19,126 [myid:2] - WARN
[RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id 1,
my id = 2, error =
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:197)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.net.SocketInputStream.read(SocketInputStream.java:211)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
2018-01-10 00:52:19,127 [myid:2] - WARN
[RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
2018-01-10 00:52:19,127 [myid:2] - WARN
[SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting
for message on queue
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
2018-01-10 00:52:19,128 [myid:2] - WARN
[SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving thread
2018-01-10 00:52:32,495 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] -
Notification time out: 60000
2018-01-10 00:52:32,497 [myid:2] - INFO
[WorkerReceiver[myid=2]:FastLeaderElection@597] - Notification: 1 (message
format version), 2 (n.leader), 0x707e3e9a9 (n.zxid), 0x1 (n.round), LOOKING
(n.state), 2 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state)
2018-01-10 00:52:32,499 [myid:2] - INFO
[WorkerSender[myid=2]:QuorumCnxManager@193] - Have smaller server
identifier, so dropping the connection: (3, 2)
my configuration on all three servers are:
clientPort=2181
dataDir=/var/opt/zookeeper/data
tickTime=2000
autopurge.purgeInterval=24
initLimit=10
syncLimit=5
server.1=10.1.0.122:2888:3888
server.2=10.1.1.75:2888:3888
server.3=10.1.2.221:2888:3888
server 3 is currently leader
server 1 is currently follower
server 2 currently cannot rejoin the ensemble
myid files are correctly configured for all three servers. this is a
production cluster so I would like to know if there was a way to force the
node back into the cluster without anything drastic that would cause the
quorum to be lost.
Sent from:
http://zookeeper-user.578899.n2.nabble.com/