Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> Multiop submitted to non-leader fails


Copy link to this message
-
Multiop submitted to non-leader fails
I'm trying to debug a really obscure problem we've been seeing integrating
multiop into our code base.

After a lot of debugging, I've isolated the error scenario down to when the
multi op gets submitted through any non-leader. The symptoms I see when I
submit the multi-op to a non-leader is that the multi-op, no matter how
small (in this case it only has one op in it to create a single zknode)
always times out. When submitting it to non-leaders, I am seeing a couple of
key messages in the log files:

2011-07-12 13:20:59,168 - WARN  [LearnerHandler-/127.0.0.7:53952:Leader@507][]
- *Commiting zxid 0x100000000 from /127.0.0.7:2182 not first!*
2011-07-12 13:20:59,169 - WARN  [LearnerHandler-/127.0.0.7:53952:Leader@509][]
- *First is 0           *
2011-07-12 13:20:59,169 - INFO  [LearnerHandler-/127.0.0.7:53952:Leader@533][]
- Have quorum of supporters; starting up and setting last processed zxid:
4294967296

2011-07-12 13:20:59,379 - INFO  [NIOServerCxn.Factory:/127.0.0.5:2181
:NIOServerCnxnFactory@197][] - Accepted socket connection from /
127.0.0.5:53553

2011-07-12 13:20:59,381 - WARN  [NIOServerCxn.Factory:/127.0.0.5:2181
:ZooKeeperServer@807][] - Connection request from old client /
127.0.0.5:53553; will be dropped if server is in r-o mode

2011-07-12 13:20:59,381 - INFO  [NIOServerCxn.Factory:/127.0.0.5:2181
:ZooKeeperServer@853][] - Client attempting to establish new session at /
127.0.0.5:53553

2011-07-12 13:20:59,385 - INFO  [SyncThread:3:FileTxnLog@195][] - Creating
new log file: log.100000001
2011-07-12 13:20:59,387 - WARN  [QuorumPeer[myid=1]/127.0.0.5:2181
:Follower@118][] - Got zxid 0x100000001 expected 0x1
2011-07-12 13:20:59,387 - INFO  [SyncThread:1:FileTxnLog@195][] - Creating
new log file: log.100000001
2011-07-12 13:20:59,387 - WARN  [QuorumPeer[myid=2]/127.0.0.6:2181
:Follower@118][] - Got zxid 0x100000001 expected 0x1
If I instead submit the exact same multi-op to the leader instead of a
follower, it succeeds every time.

I'm suspecting there is a bug in how a Multi-op is forwarded between a
follower and a leader (or vice versa). I'm going to be digging into this
tonight, but if anyone has any pointers on where to look, I'd be super
appreciative.

Thanks,
Marshall
+
Marshall McMullen 2011-07-12, 23:31
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB