Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> Re: new leader accepting create requests too early?


Copy link to this message
-
Re: new leader accepting create requests too early?
Hi Jeremy, Thanks for reporting. Without logs from the other nodes, it  
is somewaht difficult to determine more precisely what happened. But,  
from the logs, it sounds like node #1 (id 215) is the only leading  
from these two notifications:

2672 [WorkerReceiver Thread] INFO
org.apache.zookeeper.server.quorum.FastLeaderElection  - Notification:
215 (n.leader), 12884902548 (n.zxid), 3 (n.round), LEADING (n.state),
215 (n.sid), LOOKING (my state)

2672 [WorkerReceiver Thread] INFO
org.apache.zookeeper.server.quorum.FastLeaderElection  - Notification:
215 (n.leader), 12884902548 (n.zxid), 3 (n.round), FOLLOWING
(n.state), 126 (n.sid), LEADING (my state)

 From the information you have given, I can't explain right now why  
the sequence number is smaller in step 12. If you happen to have more  
information, it would be great if you could report in a jira.

-Flavio

PS: We should perhaps discuss this issue on dev, so I'm copying it to  
dev.

On Mar 22, 2011, at 3:15 AM, Jeremy Stribling wrote:

> Hi all,
>
> I ran into a weird case where Zookeeper seems to have elected a new
> leader (a node that just restarted and missed several operations),  
> and a
> client is able to connect to it and create a new sequential node that
> has a number earlier than the last node it created.  I don't have full
> logs, or a live system in this state, or any data directories, just  
> some
> partial server logs and the evidence as seen by the client.  Haven't
> tried reproducing it yet, just wanted to see if anyone here had any
> ideas.  Here's the scenario (probably more info than necessary, but
> trying to be complete)
>
> Version: Zookeeper 3.3.3
>
> 1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called
> nodes #1, #2, and #3 below):
> 2) Nodes periodically (and throughout this whole timeline) create
> sequential, non-ephemeral nodes under the /zkrsm parent node.
> 3) 5:46:57: Node #1 gets notified of
> /zkrsm/0000000000000000_record0000002116
> 4) 5:47:06: Node #1 restarts and rejoins
> 5) 5:49:26: Node #2 gets notified of
> /zkrsm/0000000000000000_record0000002708
> 6) 5:49:29: Node #2 restarts and rejoins
> 7) 5:52:01: Node #3 gets notified of
> /zkrsm/0000000000000000_record0000003291
> 8) 5:52:02: Node #3 restarts and begins the rejoining process
> 9) 5:52:08: Node #1 successfully creates
> /zkrsm/0000000000000000_record0000003348
> 10) 5:52:08: Node #2 dies after getting notified of
> /zkrsm/0000000000000000_record0000003348
> 11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't
> have wallclock timestamps, so not exactly sure on the ordering of  
> this step)
> 12) 5:52:15: Node #1 successfully creates
> /zkrsm/0000000000000000_record0000003292
>
> Note that the node created in step #12 is lower than the one created  
> in
> step #9, and is exactly one greater than the last node seen by node #3
> before it restarted.
>
> The leader election bit from node #3's log after restarting might be  
> of
> interest:
>
>>
>> 2644 [QuorumPeer:/0.0.0.0:2888] INFO
>> org.apache.zookeeper.server.quorum.QuorumPeer  - LOOKING
>> 2644 [QuorumPeer:/0.0.0.0:2888] DEBUG
>> org.apache.zookeeper.server.quorum.QuorumPeer  - Initializing leader
>> election protocol...
>> 2647 [QuorumPeer:/0.0.0.0:2888] INFO
>> org.apache.zookeeper.server.quorum.FastLeaderElection  - New  
>> election.
>> My id =  37, Proposed zxid = 17179869831
>> 2650 [WorkerSender Thread] DEBUG
>> org.apache.zookeeper.server.quorum.QuorumCnxManager  - Opening  
>> channel
>> to server 126
>> 2653 [WorkerReceiver Thread] DEBUG
>> org.apache.zookeeper.server.quorum.FastLeaderElection  - Receive new
>> notification message. My id = 37
>> 2653 [WorkerReceiver Thread] INFO
>> org.apache.zookeeper.server.quorum.FastLeaderElection  -  
>> Notification:
>> 37 (n.leader), 17179869831 (n.zxid), 1 (n.round), LOOKING (n.state),
>> 37 (n.sid), LOOKING (my state)
>> 2654 [QuorumPeer:/0.0.0.0:2888] DEBUG
>> org.apache.zookeeper.server.quorum.FastLeaderElection  - id: 37,
>> proposed id: 37, zxid: 17179869831, proposed zxid: 17179869831

flavio
junqueira

research scientist

[EMAIL PROTECTED]
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB