Kafka version is 0.8.1.1. We have three machines: A,B,C. Let’s say there is a topic with replication 2 and one of it’s partitions - partition 1 is placed on brokers A and B. If the broker A is already down than for the partition 1 we have: Leader: B, ISR: [B]. If the current controller is node C, than killing broker B will turn partition 1 into state: Leader: -1, ISR: . But if the current controller is node B, than killing it won’t update leadership/isr for partition 1 even when controller will be restarted on node C, so partition 1 will forever think it’s leader is node B which is dead.
It looks that KafkaController.onBrokerFailure handles situation when the broker down is the partition leader - it sets the new leader value to -1. To the contrary, KafkaController.onControllerFailover never removes leader from the partition with all replicas offline - allegedly because partition gets into ReplicaDeletionIneligible state. Is it intended behavior?
This behavior affects DefaultEventHandler.getPartition in the null key case - it can’t determine partition 1 as having no leader, and this results into events send failure. What we are trying to achieve - is to be able to write data even if some partitions lost all replicas, which is rare yet still possible scenario. Using null key looked suitable with minor DefaultEventHandler modifications (like getting rid from DefaultEventHandler.sendPartitionPerTopicCache to avoid caching and uneven events distribution) as we neither use logs compaction nor rely on partitioning of the data. We had such behavior with kafka 0.7 - if the node is down, simply produce to a different one. Thanks, Alex
Re: Killing last replica for partition doesn't change ISR/Leadership if replica is running controller