I'm running into an issue where sometimes, the controlled shutdown fails to
complete after the default 3 retry attempts. This ended up in one case,
with a broker under going an unclean shutdown, and then it was in a rather
bad state after restart. Producers would connect to the metadata vip,
still think that this broker was the leader, and then fail on that leader,
and then reconnect to to the metadata vip, and get sent back to that same
failed broker! Does that make sense?
I'm trying to understand the conditions which cause the controlled shutdown
to fail? There doesn't seem to be a setting for max amount of time to
It would be nice to specify how long to try before giving up (hopefully
giving up in a graceful way).
Instead, we have a retry count, but it's not clear what this retry count is
really specifying, in terms of how long to keep trying, etc.
Also, what are the ramifications for different settings for the
controlled.shutdown.retry.backoff.ms? Is there a reason we want to wait
before retrying again (again, it would be helpful to understand the reasons
for a controlled shutdown failure).