I am trying to debug a strange issue we are seeing. We are using
"Sarama" , our own Go implementation of the Kafka API.
Somehow, we either found a bug in Kafka, have a bug in our own code, or
got our cluster into a weird state. If we use our library to query Kafka
for metadata about any partition, it often (but not always!) returns an
ISR that is a strict subset of the replica set.
However, if I use the "bin/kafka-list-topic.sh" which comes with Kafka,
it shows the two sets as equal. I believe that tool gets its data from
Zookeeper. If I look at Zookeeper directly, I also see matching sets.
Does anybody have any idea what could cause this? What's the most
reliable way to get this data? Kafka or Zookeeper?
Any idea why we are seeing this? And if it's not a bug in the reporting,
what could have caused the replicas to not be in-sync (and how do we
trigger them to catch up? They don't seem to do it automatically).