-Re: Unpacking "strong durability and fault-tolerance guarantees"
Jay Kreps 2013-07-07, 21:36
I think there may be some confusion as we significantly strengthened the
guarantees in the latest release (0.8) and since this is pretty recent we
haven't updated all the documentation. Prior to this we did not support
As of 0.8:
1. If your replication factor is 3 you can tolerate 2 failures before you
2. This question is a little confusing--there are guarantees for both the
producer and the consumer.
Kafka does have a configurable acknowledgement policy for the producer as
of 0.8 using the request.required.acks setting. If this is set to 0 it does
not wait. If it is set to 1 it waits only for the broker to which the data
was written. If it is set to -1 it waits for all caught up replicas.
The consumers position is controlled using a saved "offset" that marks its
position in the topic/partition it is reading. This position is
periodically updated. If you update the saved offset before processing
messages you have the possibility of message loss if your consumer crashes
before processing the messages. If you update the offset after processing
the messages you have the possibility of duplicate messages when your
consumer restarts as it will reprocess a few message it has already seen.
You can control when the position is saved by calling commit().
On Sun, Jul 7, 2013 at 8:35 AM, David James <[EMAIL PROTECTED]> wrote:
> Sorry for the long email, but I've tried to keep it organized, at least.
> "Kafka has a modern cluster-centric design that offers strong
> durability and fault-tolerance guarantees." and "Messages are
> persisted on disk and replicated within the cluster to prevent data
> loss." according to http://kafka.apache.org/.
> I'm trying to understand what this means in some detail. So, two questions.
> 1. Fault-Tolerance
> If a Broker in a Kafka cluster fails (the EC2 instance dies), what
> happens? After, let's say I add a new Broker to the cluster (that my
> responsibility, not Kafka's). What happens when it rejoins?
> To be more particular, if the cluster consists of a Zookeeper and B
> (3, for example) Brokers, can a Kafka system guarantee to tolerate up
> to B-1 (2, for example) Broker failures?
> 2. Durability at an application level
> What are the guarantees about durability, at an application level, in
> practice? By "application level" I mean guarantees that a produced
> message gets consumed and acted upon by an application that uses
> Kafka. My understanding at present is that Kafka does not make these
> kinds of guarantees because there are no acks. So, it is up to the
> application developer to handle it. Is this right?
> Here's my understanding: Having messages persisted on disk and
> replicated is why Kafka has durability guarantees. But, from an
> application perspective, what happens when a consumer pulls a message
> but fails before acting on it? That would update the Kafka consumer
> offset, right? So, without some thinking and planning ahead on the
> Kafka system design, the application's consumers would not have a way
> of knowing that a message was not actually processed.
> Conclusion / Last Question
> I'm interested in making the chance of message loss minimal, at a
> system level. Any pointers on what to read or think about would be