Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - What to do when a node will not join the cluster?


Copy link to this message
-
What to do when a node will not join the cluster?
Brian Tarbox 2012-11-19, 17:13
I have a four node cluster (I know, it should be odd) that generally runs
fine but this morning I needed to restart the whole cluster and one of the
nodes will not sync.  The node asks for a snapshot from the leader..waits
for several minutes(!) and then fails.

11:46:55,130 [myid:] - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@294]
- Getting a snapshot from leader
11:47:01,535 [myid:] - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@325]
- Setting leader epoch e
11:47:21,707 [myid:] - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@341]
- Got zxid 0xe0000000a expected 0x1
11:55:01,515 [myid:] - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@82]
- Exception when following the leader
java.io.EOFException

On the Leader side it appears to be sending the snapshot and then it fails.
I have no idea how to proceed...any suggestion appreciated.

11:46:55,129 [myid:5] - INFO  [LearnerHandler-/172.16.10.200:46021
:LearnerHandler@318] - Synchronizing with Follower sid: 4
maxCommittedLog=0xe00000009 minCommittedLog=0xe00000001
peerLastZxid=0x900323414
11:46:55,129 [myid:5] - WARN  [LearnerHandler-/172.16.10.200:46021
:LearnerHandler@379] - Unhandled proposal scenario
11:46:55,129 [myid:5] - INFO  [LearnerHandler-/172.16.10.200:46021
:LearnerHandler@395] - Sending SNAP
11:46:55,129 [myid:5] - INFO  [LearnerHandler-/172.16.10.200:46021
:LearnerHandler@419] - Sending snapshot last zxid of peer is 0x900323414
 zxid of leader is 0xe00000009sent zxid of db as 0xe00000009
11:55:01,513 [myid:5] - ERROR [LearnerHandler-/172.16.10.200:46021
:LearnerHandler@562] - Unexpected exception causing shutdown while sock
still open
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        at java.io.DataInputStream.readInt(Unknown Source)
        at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
        at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
        at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:450)
11:55:01,513 [myid:5] - WARN  [LearnerHandler-/172.16.10.200:46021
:LearnerHandler@575] - ******* GOODBYE /172.16.10.200:46021 ********
+
Diego Oliveira 2012-11-20, 11:57