Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - Node not joining ensemble


+
Jordan Zimmerman 2011-10-21, 22:57
+
Jordan Zimmerman 2011-10-21, 23:34
+
Jordan Zimmerman 2011-10-22, 01:13
Copy link to this message
-
Re: Node not joining ensemble
Flavio Junqueira 2011-10-23, 10:36
Here is my interpretation after reading the logs:

1- Node 3 was restarted and initiated leader election for round 1;
2- Node 3 received a notification from 1 saying that it is the leader,  
but it didn't get a confirmation from a quorum. Since node 3 has a  
higher id and zxid, it does not change its mind about who should be  
the leader: itself;
3- Node 3 didn't receive a notification from 2 showing that a quorum  
supports 1, so node 3 sticks to its vote.

It sound like a bug to me, so I suggest you report it on a jira.

-Flavio

On Oct 22, 2011, at 3:13 AM, Jordan Zimmerman wrote:

> Interesting. I restarted Server 2 in the ensemble and the problem  
> cleared
> itself.
>
> -JZ
>
> On 10/21/11 4:34 PM, "Jordan Zimmerman" <[EMAIL PROTECTED]>  
> wrote:
>
>> FYI - I turned on DEBUG and here's more log info:
>>
>> 2011-10-21 23:33:06,732 - DEBUG
>> [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@510] - id: 3, proposed  
>> id: 3,
>> zxid: 12885265585, proposed zxid: 12885265585
>> 2011-10-21 23:33:06,732 - DEBUG
>> [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@727] - Adding vote:  
>> From = 3,
>> Proposed leader = 3, Porposed zxid = 12885265585, Proposed epoch = 1
>> 2011-10-21 23:33:06,734 - DEBUG [WorkerReceiver
>> Thread:FastLeaderElection$Messenger$WorkerReceiver@214] - Receive new
>> notification message. My id = 3
>> 2011-10-21 23:33:06,735 - INFO  [WorkerReceiver
>> Thread:FastLeaderElection@496] - Notification: 1 (n.leader),  
>> 8589935532
>> (n.zxid), 3 (n.round), LEADING (n.state), 1 (n.sid), LOOKING (my  
>> state)
>> 2011-10-21 23:33:08,336 - DEBUG
>> [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@414] - Queue size: 0
>> 2011-10-21 23:33:08,336 - INFO
>> [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification  
>> time out:
>> 3200
>> 2011-10-21 23:33:08,336 - DEBUG [WorkerSender  
>> Thread:QuorumCnxManager@389]
>> - There is a connection already for server 1
>> 2011-10-21 23:33:08,337 - DEBUG [WorkerSender  
>> Thread:QuorumCnxManager@389]
>> - There is a connection already for server 2
>> 2011-10-21 23:33:08,337 - DEBUG [WorkerReceiver
>> Thread:FastLeaderElection$Messenger$WorkerReceiver@214] - Receive new
>> notification message. My id = 3
>> 2011-10-21 23:33:08,337 - INFO  [WorkerReceiver
>> Thread:FastLeaderElection@496] - Notification: 3 (n.leader),  
>> 12885265585
>> (n.zxid), 1 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my  
>> state)
>> 2011-10-21 23:33:08,337 - DEBUG
>> [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@510] - id: 3, proposed  
>> id: 3,
>> zxid: 12885265585, proposed zxid: 12885265585
>> 2011-10-21 23:33:08,337 - DEBUG
>> [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@727] - Adding vote:  
>> From = 3,
>> Proposed leader = 3, Porposed zxid = 12885265585, Proposed epoch = 1
>> 2011-10-21 23:33:08,339 - DEBUG [WorkerReceiver
>> Thread:FastLeaderElection$Messenger$WorkerReceiver@214] - Receive new
>> notification message. My id = 3
>> 2011-10-21 23:33:08,339 - INFO  [WorkerReceiver
>> Thread:FastLeaderElection@496] - Notification: 1 (n.leader),  
>> 8589935532
>> (n.zxid), 3 (n.round), LEADING (n.state), 1 (n.sid), LOOKING (my  
>> state)
>> 2011-10-21 23:33:11,540 - DEBUG
>> [QuorumPeer:/0.0.0.0:2181:QuorumCnxManager@414] - Queue size: 0
>> 2011-10-21 23:33:11,540 - INFO
>> [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification  
>> time out:
>> 6400
>> 2011-10-21 23:33:11,540 - DEBUG [WorkerSender  
>> Thread:QuorumCnxManager@389]
>> - There is a connection already for server 1
>> 2011-10-21 23:33:11,541 - DEBUG [WorkerSender  
>> Thread:QuorumCnxManager@389]
>> - There is a connection already for server 2
>> 2011-10-21 23:33:11,541 - DEBUG [WorkerReceiver
>> Thread:FastLeaderElection$Messenger$WorkerReceiver@214] - Receive new
>> notification message. My id = 3
>> 2011-10-21 23:33:11,541 - INFO  [WorkerReceiver
>> Thread:FastLeaderElection@496] - Notification: 3 (n.leader),  
>> 12885265585
>> (n.zxid), 1 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my  
>> state)

flavio
junqueira

research scientist

[EMAIL PROTECTED]
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301