I have a cluster of 3 kafka servers. Replication factor is 3. Two out of 3 servers were shutdown and traffic was sent to only one server that was up. I brought second host up and it says according to logs that server has started.
I ran ./kafka-list-topic.sh --zookeeper <host> Still was showing leaders are not distributed. Then ran kafka-preferred-replica-election.sh which exited with error:
kafka.common.AdminCommandFailedException: Admin command failed at kafka.admin.PreferredReplicaLeaderElectionCommand.moveLeaderToPreferredReplica(PreferredReplicaLeaderElectionCommand.scala:119) at kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredReplicaLeaderElectionCommand.scala:60) at kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplicaLeaderElectionCommand.scala)
Would you please give suggestion what have caused the exception and how to recover from it?
I think the error message can be improved to at least print which partitions it couldn't move the leader for. What could be happening is that the 2 brokers that were down might not have entered the ISR yet. So the tool will not be able to move any leaders to them. You can run kafka-list-topics with the --under-replicated-count option to print the list of under replicated partitions.
Please can you file a bug to improve the error reporting of this tool?
Thanks, Neha On Mon, Aug 19, 2013 at 12:26 PM, Vadim Keylis <[EMAIL PROTECTED]>wrote:
You can monitor the under replicated partition count through the "kafka.server.UnderReplicatedPartitions" jmx bean on every leader. Another way, which is heavy weight is to run kafka-list-topics, but I would recommend running that only for diagnostic purposes, not for monitoring.
Thanks, Neha On Mon, Aug 19, 2013 at 1:07 PM, Vadim Keylis <[EMAIL PROTECTED]> wrote: