Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> Multiop submitted to non-leader fails


Copy link to this message
-
Re: Multiop submitted to non-leader fails
I actually think I've figured it out. It was a trivial (but really vital)
missing case statement. I'll send you more details later tonight...

On Tue, Jul 12, 2011 at 4:41 PM, Marshall McMullen <
[EMAIL PROTECTED]> wrote:

> I'm trying to debug a really obscure problem we've been seeing integrating
> multiop into our code base.
>
> After a lot of debugging, I've isolated the error scenario down to when the
> multi op gets submitted through any non-leader. The symptoms I see when I
> submit the multi-op to a non-leader is that the multi-op, no matter how
> small (in this case it only has one op in it to create a single zknode)
> always times out. When submitting it to non-leaders, I am seeing a couple
> of key messages in the log files:
>
> 2011-07-12 13:20:59,168 - WARN  [LearnerHandler-/127.0.0.7:53952
> :Leader@507][] - *Commiting zxid 0x100000000 from /127.0.0.7:2182 not
> first!*
>
> 2011-07-12 13:20:59,169 - WARN  [LearnerHandler-/127.0.0.7:53952
> :Leader@509][] - *First is 0           *
>
>
> 2011-07-12 13:20:59,169 - INFO  [LearnerHandler-/127.0.0.7:53952
> :Leader@533][] - Have quorum of supporters; starting up and setting last
> processed zxid: 4294967296
>
> 2011-07-12 13:20:59,379 - INFO  [NIOServerCxn.Factory:/127.0.0.5:2181
> :NIOServerCnxnFactory@197][] - Accepted socket connection from /
> 127.0.0.5:53553
>
> 2011-07-12 13:20:59,381 - WARN  [NIOServerCxn.Factory:/127.0.0.5:2181
> :ZooKeeperServer@807][] - Connection request from old client /
> 127.0.0.5:53553; will be dropped if server is in r-o mode
>
> 2011-07-12 13:20:59,381 - INFO  [NIOServerCxn.Factory:/127.0.0.5:2181
> :ZooKeeperServer@853][] - Client attempting to establish new session at /
> 127.0.0.5:53553
>
> 2011-07-12 13:20:59,385 - INFO  [SyncThread:3:FileTxnLog@195][] - Creating
> new log file: log.100000001
>
>
> 2011-07-12 13:20:59,387 - WARN  [QuorumPeer[myid=1]/127.0.0.5:2181
> :Follower@118][] - Got zxid 0x100000001 expected 0x1
>
>
> 2011-07-12 13:20:59,387 - INFO  [SyncThread:1:FileTxnLog@195][] - Creating
> new log file: log.100000001
>
>
> 2011-07-12 13:20:59,387 - WARN  [QuorumPeer[myid=2]/127.0.0.6:2181
> :Follower@118][] - Got zxid 0x100000001 expected 0x1
>
>
> If I instead submit the exact same multi-op to the leader instead of a
> follower, it succeeds every time.
>
> I'm suspecting there is a bug in how a Multi-op is forwarded between a
> follower and a leader (or vice versa). I'm going to be digging into this
> tonight, but if anyone has any pointers on where to look, I'd be super
> appreciative.
>
> Thanks,
> Marshall
>
>
>