Kafka replication aims to guarantee that committed writes are not lost. In
other words, as long as leader can be transitioned to a broker that was in
the ISR, no data will be lost. For increased availability, if there are no
other brokers in the ISR, we fall back to electing a broker that is not
caught up with the current leader, as the new leader. IMO, this is the real
problem that the post is complaining about.
Let me explain his test in more detail-
1. The first part of the test partitions the leader (n1) from other brokers
(n2-n5). The leader shrinks the ISR to just itself and ends up taking n
writes. This is not a problem all by itself. Once the partition is
resolved, n2-n5 would catch up from the leader and no writes will be lost,
since n1 would continue to serve as the leader.
2. The problem starts in the second part of the test where it partitions
the leader (n1) from zookeeper. This causes the unclean leader election
(mentioned above), which causes Kafka to lose data.
We thought about this while designing replication, but never ended up
including the feature that would allow some applications to pick
consistency over availability. Basically, we could let applications pick
some topics for which the controller will never attempt unclean leader
election. The result is that Kafka would reject writes and mark the
partition offline, instead of moving leadership to a broker that is not in
ISR, and losing the writes.
I think if we included this knob, the tests that aphyr (jepsen) ran, would
make more sense.
On Thu, Aug 22, 2013 at 12:50 PM, Scott Clasen <[EMAIL PROTECTED]> wrote: