Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> Conditional leader and ISR update fails forever

Copy link to this message
Conditional leader and ISR update fails forever
I'm not quite sure how we get into this state, but we've seen this a few times now. Basically, one of our brokers (broker 1 in this case) gets into a state where ISR updates fail forever:

[2013-10-16 06:19:12,448] ERROR Conditional update of path /brokers/topics/search-gateway-wal/partitions/5/state with data { "controller_epoch":62, "isr":[ 1, 3 ], "leader":1, "leader_epoch":61, "version":1 } and expected version 125 failed due to org.apache.zookeeper.\
KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /brokers/topics/search-gateway-wal/partitions/5/state (kafka.utils.ZkUtils$)
[2013-10-16 06:19:12,448] INFO Partition [search-gateway-wal,5] on broker 1: Cached zkVersion [125] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)

This repeats over and over again for a subset of the partitions. Looking at other brokers in the cluster it seems that they think that broker 1 is also the controller and that the partition in this example has the following state:

(search-gateway-wal,5) -> (LeaderAndIsrInfo:(Leader:1, ISR:1,LeaderEpoch:61,ControllerEpoch:62),ReplicationFactor:3),AllReplicas:1,3,4)

Looking at the code in Partition, it seems that the zkVersion is only ever updated on makeFollower/makeLeader

Any ideas on how we may have gotten into this state?