Thanks Neha for continued insight....
What you describe as a possible solution is what I was thinking (although I
wasn't as concerned as maybe I should be with the added delay of the new
leader delaying processing new requests while it finishes consuming from
the old leader, and communicates back and forth to complete the leader
E.g., isn't the definition of a broker being in the ISR that it is keeping
itself up to date with the leader of the ISR (within an allowed replication
lag)? So, it should be possible to elect a new leader, have it buffer
incoming requests while it finishes replicating everything from the old
leader (which it should complete within an allowed replication lag
timeout), and then start acking any buffered requests.
I guess this buffering period would be akin to the leader 'unavailability'
window, but in reality, it is just a delay (and shouldn't be much more than
the replication lag timeout). The producing client can decide to timeout
the request if it's taking too long, and retry it (that's normal anyway if
a producer fails to get an ack, etc.).
So, as long as the old leader atomically starts rejecting incoming requests
at the time it relinquishes leadership, then producer requests will fail
fast, initiate a new meta data request to find the new leader, and continue
on with the new leader (possibly after a bit of a replication catch up
The old leader can then proceed with shutdown after the new leader has
caught up (which it will signal with an RPC).
I realize there are all sorts of edge cases here, but it seems there should
be a way to make it work.
I guess I'm willing to allow a bit of an 'unavailability' delay, rather
than have messages silently acked then lost during a controlled
shutdown/new leader election.
My hunch is, in the steady state, when leadership is stable and brokers
aren't being shutdown, the performance benefit of being able to use
request.required.acks=1 (instead of -1), far outweighs any momentary
performance blip during a leader availability delay during a leadership
change (which should be fully recoverable and retryable by a concerned
Now, of course, if I want to guard against a hard-shutdown, then that's a
whole different ball of wax!
On Sun, Oct 6, 2013 at 4:30 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote: