Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> Multiop submitted to non-leader fails


+
Marshall McMullen 2011-07-12, 22:41
Copy link to this message
-
Re: Multiop submitted to non-leader fails
I actually think I've figured it out. It was a trivial (but really vital)
missing case statement. I'll send you more details later tonight...

On Tue, Jul 12, 2011 at 4:41 PM, Marshall McMullen <
[EMAIL PROTECTED]> wrote:

> I'm trying to debug a really obscure problem we've been seeing integrating
> multiop into our code base.
>
> After a lot of debugging, I've isolated the error scenario down to when the
> multi op gets submitted through any non-leader. The symptoms I see when I
> submit the multi-op to a non-leader is that the multi-op, no matter how
> small (in this case it only has one op in it to create a single zknode)
> always times out. When submitting it to non-leaders, I am seeing a couple
> of key messages in the log files:
>
> 2011-07-12 13:20:59,168 - WARN  [LearnerHandler-/127.0.0.7:53952
> :Leader@507][] - *Commiting zxid 0x100000000 from /127.0.0.7:2182 not
> first!*
>
> 2011-07-12 13:20:59,169 - WARN  [LearnerHandler-/127.0.0.7:53952
> :Leader@509][] - *First is 0           *
>
>
> 2011-07-12 13:20:59,169 - INFO  [LearnerHandler-/127.0.0.7:53952
> :Leader@533][] - Have quorum of supporters; starting up and setting last
> processed zxid: 4294967296
>
> 2011-07-12 13:20:59,379 - INFO  [NIOServerCxn.Factory:/127.0.0.5:2181
> :NIOServerCnxnFactory@197][] - Accepted socket connection from /
> 127.0.0.5:53553
>
> 2011-07-12 13:20:59,381 - WARN  [NIOServerCxn.Factory:/127.0.0.5:2181
> :ZooKeeperServer@807][] - Connection request from old client /
> 127.0.0.5:53553; will be dropped if server is in r-o mode
>
> 2011-07-12 13:20:59,381 - INFO  [NIOServerCxn.Factory:/127.0.0.5:2181
> :ZooKeeperServer@853][] - Client attempting to establish new session at /
> 127.0.0.5:53553
>
> 2011-07-12 13:20:59,385 - INFO  [SyncThread:3:FileTxnLog@195][] - Creating
> new log file: log.100000001
>
>
> 2011-07-12 13:20:59,387 - WARN  [QuorumPeer[myid=1]/127.0.0.5:2181
> :Follower@118][] - Got zxid 0x100000001 expected 0x1
>
>
> 2011-07-12 13:20:59,387 - INFO  [SyncThread:1:FileTxnLog@195][] - Creating
> new log file: log.100000001
>
>
> 2011-07-12 13:20:59,387 - WARN  [QuorumPeer[myid=2]/127.0.0.6:2181
> :Follower@118][] - Got zxid 0x100000001 expected 0x1
>
>
> If I instead submit the exact same multi-op to the leader instead of a
> follower, it succeeds every time.
>
> I'm suspecting there is a bug in how a Multi-op is forwarded between a
> follower and a leader (or vice versa). I'm going to be digging into this
> tonight, but if anyone has any pointers on where to look, I'd be super
> appreciative.
>
> Thanks,
> Marshall
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB