Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> Conditional leader and ISR update fails forever


Copy link to this message
-
Conditional leader and ISR update fails forever
I'm not quite sure how we get into this state, but we've seen this a few times now. Basically, one of our brokers (broker 1 in this case) gets into a state where ISR updates fail forever:

[2013-10-16 06:19:12,448] ERROR Conditional update of path /brokers/topics/search-gateway-wal/partitions/5/state with data { "controller_epoch":62, "isr":[ 1, 3 ], "leader":1, "leader_epoch":61, "version":1 } and expected version 125 failed due to org.apache.zookeeper.\
KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /brokers/topics/search-gateway-wal/partitions/5/state (kafka.utils.ZkUtils$)
[2013-10-16 06:19:12,448] INFO Partition [search-gateway-wal,5] on broker 1: Cached zkVersion [125] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)

This repeats over and over again for a subset of the partitions. Looking at other brokers in the cluster it seems that they think that broker 1 is also the controller and that the partition in this example has the following state:

(search-gateway-wal,5) -> (LeaderAndIsrInfo:(Leader:1, ISR:1,LeaderEpoch:61,ControllerEpoch:62),ReplicationFactor:3),AllReplicas:1,3,4)

Looking at the code in Partition, it seems that the zkVersion is only ever updated on makeFollower/makeLeader

Any ideas on how we may have gotten into this state?

/Sam
 
+
Jun Rao 2013-10-16, 14:48
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB