Jason Rosenberg 2013-10-04, 18:21
Neha Narkhede 2013-10-04, 22:38
Jason Rosenberg 2013-10-05, 06:52
Neha Narkhede 2013-10-05, 17:01
Jason Rosenberg 2013-10-05, 17:18
Neha Narkhede 2013-10-06, 08:09
Jason Rosenberg 2013-10-06, 17:09
Neha Narkhede 2013-10-06, 20:30
Jason Rosenberg 2013-10-07, 04:14
As Neha said, what you said is possible, but may require a more careful
design. For example, what if the followers don't catch up with the leader
quickly? Do we want to wait forever or up to some configurable amount of
time? If we do the latter, we may still lose data during controlled
On Sun, Oct 6, 2013 at 9:14 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
> Thanks Neha for continued insight....
> What you describe as a possible solution is what I was thinking (although I
> wasn't as concerned as maybe I should be with the added delay of the new
> leader delaying processing new requests while it finishes consuming from
> the old leader, and communicates back and forth to complete the leader
> E.g., isn't the definition of a broker being in the ISR that it is keeping
> itself up to date with the leader of the ISR (within an allowed replication
> lag)? So, it should be possible to elect a new leader, have it buffer
> incoming requests while it finishes replicating everything from the old
> leader (which it should complete within an allowed replication lag
> timeout), and then start acking any buffered requests.
> I guess this buffering period would be akin to the leader 'unavailability'
> window, but in reality, it is just a delay (and shouldn't be much more than
> the replication lag timeout). The producing client can decide to timeout
> the request if it's taking too long, and retry it (that's normal anyway if
> a producer fails to get an ack, etc.).
> So, as long as the old leader atomically starts rejecting incoming requests
> at the time it relinquishes leadership, then producer requests will fail
> fast, initiate a new meta data request to find the new leader, and continue
> on with the new leader (possibly after a bit of a replication catch up
> The old leader can then proceed with shutdown after the new leader has
> caught up (which it will signal with an RPC).
> I realize there are all sorts of edge cases here, but it seems there should
> be a way to make it work.
> I guess I'm willing to allow a bit of an 'unavailability' delay, rather
> than have messages silently acked then lost during a controlled
> shutdown/new leader election.
> My hunch is, in the steady state, when leadership is stable and brokers
> aren't being shutdown, the performance benefit of being able to use
> request.required.acks=1 (instead of -1), far outweighs any momentary
> performance blip during a leader availability delay during a leadership
> change (which should be fully recoverable and retryable by a concerned
> producer client).
> Now, of course, if I want to guard against a hard-shutdown, then that's a
> whole different ball of wax!
> On Sun, Oct 6, 2013 at 4:30 PM, Neha Narkhede <[EMAIL PROTECTED]
> > Ok, so if I initiate a controlled shutdown, in which all partitions that
> > shutting down broker is leader of get transferred to another broker, why
> > can't part of that controlled transfer of leadership include ISR
> > synchronization, such that no data is lost? Is there a fundamental
> > why that is not possible? Is it it worth filing a Jira for a feature
> > request? Or is it technically not possible?
> > It is not as straightforward as it seems and it will slow down the shut
> > down operation furthermore (currently several zookeeper writes already
> > it down) and also increase the leader unavailability window. But keeping
> > the performance degradation aside, it is tricky since in order to stop
> > "new" data from coming in, we need to move the leaders off of the current
> > leader (broker being shutdown) onto some other follower. Now moving the
> > leader means some other follower will become the leader and as part of
> > will stop copying existing data from the old leader and will start
> > receiving new data. What you are asking for is to insert some kind of
> > "wait" before the new follower becomes the leader so that the consumption