|
|
-
What to do when a node will not join the cluster?Brian Tarbox 2012-11-19, 17:13
I have a four node cluster (I know, it should be odd) that generally runs
fine but this morning I needed to restart the whole cluster and one of the nodes will not sync. The node asks for a snapshot from the leader..waits for several minutes(!) and then fails. 11:46:55,130 [myid:] - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@294] - Getting a snapshot from leader 11:47:01,535 [myid:] - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@325] - Setting leader epoch e 11:47:21,707 [myid:] - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@341] - Got zxid 0xe0000000a expected 0x1 11:55:01,515 [myid:] - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@82] - Exception when following the leader java.io.EOFException On the Leader side it appears to be sending the snapshot and then it fails. I have no idea how to proceed...any suggestion appreciated. 11:46:55,129 [myid:5] - INFO [LearnerHandler-/172.16.10.200:46021 :LearnerHandler@318] - Synchronizing with Follower sid: 4 maxCommittedLog=0xe00000009 minCommittedLog=0xe00000001 peerLastZxid=0x900323414 11:46:55,129 [myid:5] - WARN [LearnerHandler-/172.16.10.200:46021 :LearnerHandler@379] - Unhandled proposal scenario 11:46:55,129 [myid:5] - INFO [LearnerHandler-/172.16.10.200:46021 :LearnerHandler@395] - Sending SNAP 11:46:55,129 [myid:5] - INFO [LearnerHandler-/172.16.10.200:46021 :LearnerHandler@419] - Sending snapshot last zxid of peer is 0x900323414 zxid of leader is 0xe00000009sent zxid of db as 0xe00000009 11:55:01,513 [myid:5] - ERROR [LearnerHandler-/172.16.10.200:46021 :LearnerHandler@562] - Unexpected exception causing shutdown while sock still open java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.DataInputStream.readInt(Unknown Source) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:450) 11:55:01,513 [myid:5] - WARN [LearnerHandler-/172.16.10.200:46021 :LearnerHandler@575] - ******* GOODBYE /172.16.10.200:46021 ******** |