Re: Kafka 0.8 Failover Behavior
It's also worth mentioning why new slave machines need to truncate
back to a known good point.
When a new server joins the cluster and already has some data on disk
we cannot blindly trust its log as it may have messages that were
never committed (for example if it was the master and then crashed
suddenly). This would lead to a situation where the replicas disagree
about the message at a given offset and are numbering things
differently. To ensure consistency you must truncate the new server
back to a known safe point and re-sync from the caught up servers to
ensure consistency. This is always a safe operation as long as one
server remains alive, the dilemma you describe and Jun commented on
occurs when you bring back an empty server as the only server which of
course makes it the master.
On Fri, Jun 21, 2013 at 8:24 PM, Jun Rao <[EMAIL PROTECTED]> wrote: