I wanted to send this out because we saw this in some testing we were doing and wanted to advise the community of something to watch for in 0.8 HA support.
We have a two machine cluster with replication factor 2. We took one machine offline and re-formatted the disk. We re-installed the Kafka software, but did not recreate any of the local disk files. The intention was to simply re-start the broker process, but due to an error in the network config that took some time to diagnose, we ended up with the both machines' brokers down.
When we fixed the network config and restarted the brokers, we happened to start the broker on the rebuilt machine first. The net result was when the healthy broker came back online, the rebuilt machine was already the leader and because of the Zookeeper state, it force the healthy broker to delete all of its topic data, thus wiping out the entire contents of the cluster.
We are instituting operations procedures to safeguard against this scenario in the future (and fortunately we only blew away a test cluster), but this was a bit of a nasty surprise for a Friday.
Thanks for reporting this. Yes, this is the current behavior when all brokers fail. Whichever broker comes back first becomes the new leader and is the source of truth. This increases availability. However, previously committed data can be lost. This is what we call unclean leader elections. Another option is instead to wait until a broker in in-sync replica set to come back before electing a new leader. This will preserve all committed data at the expense of availability. The application can configure the system with the appropriate option based on its need.
Jun On Fri, Jun 21, 2013 at 4:08 PM, Bob Jervis <[EMAIL PROTECTED]
It's also worth mentioning why new slave machines need to truncate back to a known good point.
When a new server joins the cluster and already has some data on disk we cannot blindly trust its log as it may have messages that were never committed (for example if it was the master and then crashed suddenly). This would lead to a situation where the replicas disagree about the message at a given offset and are numbering things differently. To ensure consistency you must truncate the new server back to a known safe point and re-sync from the caught up servers to ensure consistency. This is always a safe operation as long as one server remains alive, the dilemma you describe and Jun commented on occurs when you bring back an empty server as the only server which of course makes it the master.
On Fri, Jun 21, 2013 at 8:24 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
Jun, Does kafka provides ability to configure broker to be in in-sync before become availalble? Is it possible in case of all brokers crash to find out which node has the most recent data to initiate proper startup procedure?
Thanks, Vadim On Fri, Jun 21, 2013 at 8:24 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
The broker still in ISR in ZK has all committed data.
Jun On Thu, Jun 27, 2013 at 5:04 PM, Vadim Keylis <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext