It's also worth mentioning why new slave machines need to truncate
back to a known good point.
When a new server joins the cluster and already has some data on disk
we cannot blindly trust its log as it may have messages that were
never committed (for example if it was the master and then crashed
suddenly). This would lead to a situation where the replicas disagree
about the message at a given offset and are numbering things
differently. To ensure consistency you must truncate the new server
back to a known safe point and re-sync from the caught up servers to
ensure consistency. This is always a safe operation as long as one
server remains alive, the dilemma you describe and Jun commented on
occurs when you bring back an empty server as the only server which of
course makes it the master.
On Fri, Jun 21, 2013 at 8:24 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> Hi, Bob,
> Thanks for reporting this. Yes, this is the current behavior when all
> brokers fail. Whichever broker comes back first becomes the new leader and
> is the source of truth. This increases availability. However, previously
> committed data can be lost. This is what we call unclean leader elections.
> Another option is instead to wait until a broker in in-sync replica set to
> come back before electing a new leader. This will preserve all committed
> data at the expense of availability. The application can configure the
> system with the appropriate option based on its need.
> On Fri, Jun 21, 2013 at 4:08 PM, Bob Jervis <[EMAIL PROTECTED]
>> I wanted to send this out because we saw this in some testing we were
>> doing and wanted to advise the community of something to watch for in 0.8
>> HA support.
>> We have a two machine cluster with replication factor 2. We took one
>> machine offline and re-formatted the disk. We re-installed the Kafka
>> software, but did not recreate any of the local disk files. The intention
>> was to simply re-start the broker process, but due to an error in the
>> network config that took some time to diagnose, we ended up with the both
>> machines' brokers down.
>> When we fixed the network config and restarted the brokers, we happened to
>> start the broker on the rebuilt machine first. The net result was when the
>> healthy broker came back online, the rebuilt machine was already the leader
>> and because of the Zookeeper state, it force the healthy broker to delete
>> all of its topic data, thus wiping out the entire contents of the cluster.
>> We are instituting operations procedures to safeguard against this
>> scenario in the future (and fortunately we only blew away a test cluster),
>> but this was a bit of a nasty surprise for a Friday.
>> Bob Jervis