Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> Conditional leader and ISR update fails forever


Copy link to this message
-
Conditional leader and ISR update fails forever
I'm not quite sure how we get into this state, but we've seen this a few times now. Basically, one of our brokers (broker 1 in this case) gets into a state where ISR updates fail forever:

[2013-10-16 06:19:12,448] ERROR Conditional update of path /brokers/topics/search-gateway-wal/partitions/5/state with data { "controller_epoch":62, "isr":[ 1, 3 ], "leader":1, "leader_epoch":61, "version":1 } and expected version 125 failed due to org.apache.zookeeper.\
KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /brokers/topics/search-gateway-wal/partitions/5/state (kafka.utils.ZkUtils$)
[2013-10-16 06:19:12,448] INFO Partition [search-gateway-wal,5] on broker 1: Cached zkVersion [125] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)

This repeats over and over again for a subset of the partitions. Looking at other brokers in the cluster it seems that they think that broker 1 is also the controller and that the partition in this example has the following state:

(search-gateway-wal,5) -> (LeaderAndIsrInfo:(Leader:1, ISR:1,LeaderEpoch:61,ControllerEpoch:62),ReplicationFactor:3),AllReplicas:1,3,4)

Looking at the code in Partition, it seems that the zkVersion is only ever updated on makeFollower/makeLeader

Any ideas on how we may have gotten into this state?

/Sam