Zookeeper, mail # user - What to do when a node will not join the cluster?

What to do when a node will not join the cluster?
Brian Tarbox 2012-11-19, 17:13
I have a four node cluster (I know, it should be odd) that generally runs
fine but this morning I needed to restart the whole cluster and one of the
nodes will not sync.  The node asks for a snapshot from the leader..waits
for several minutes(!) and then fails.

11:46:55,130 [myid:] - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@294]
- Getting a snapshot from leader
11:47:01,535 [myid:] - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@325]
- Setting leader epoch e
11:47:21,707 [myid:] - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@341]
- Got zxid 0xe0000000a expected 0x1
11:55:01,515 [myid:] - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@82]
- Exception when following the leader

On the Leader side it appears to be sending the snapshot and then it fails.
I have no idea how to proceed...any suggestion appreciated.

11:46:55,129 [myid:5] - INFO  [LearnerHandler-/
:LearnerHandler@318] - Synchronizing with Follower sid: 4
maxCommittedLog=0xe00000009 minCommittedLog=0xe00000001
11:46:55,129 [myid:5] - WARN  [LearnerHandler-/
:LearnerHandler@379] - Unhandled proposal scenario
11:46:55,129 [myid:5] - INFO  [LearnerHandler-/
:LearnerHandler@395] - Sending SNAP
11:46:55,129 [myid:5] - INFO  [LearnerHandler-/
:LearnerHandler@419] - Sending snapshot last zxid of peer is 0x900323414
 zxid of leader is 0xe00000009sent zxid of db as 0xe00000009
11:55:01,513 [myid:5] - ERROR [LearnerHandler-/
:LearnerHandler@562] - Unexpected exception causing shutdown while sock
still open
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        at java.io.DataInputStream.readInt(Unknown Source)
11:55:01,513 [myid:5] - WARN  [LearnerHandler-/
:LearnerHandler@575] - ******* GOODBYE / ********
Diego Oliveira 2012-11-20, 11:57