Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - Unpacking "strong durability and fault-tolerance guarantees"


+
David James 2013-07-07, 15:36
Copy link to this message
-
Re: Unpacking "strong durability and fault-tolerance guarantees"
Philip O'Toole 2013-07-07, 19:51
Disclaimer: I only have experience right now with 0.7.2

On Sun, Jul 7, 2013 at 11:35 AM, David James <[EMAIL PROTECTED]> wrote:
> Sorry for the long email, but I've tried to keep it organized, at least.
>
> "Kafka has a modern cluster-centric design that offers strong
> durability and fault-tolerance guarantees." and "Messages are
> persisted on disk and replicated within the cluster to prevent data
> loss." according to http://kafka.apache.org/.
>
> I'm trying to understand what this means in some detail. So, two questions.
>
> 1. Fault-Tolerance
>
> If a Broker in a Kafka cluster fails (the EC2 instance dies), what
> happens? After, let's say I add a new Broker to the cluster (that my
> responsibility, not Kafka's). What happens when it rejoins?
>
> To be more particular, if the cluster consists of a Zookeeper and B
> (3, for example) Brokers, can a Kafka system guarantee to tolerate up
> to B-1 (2, for example) Broker failures?
>
> 2. Durability at an application level
>
> What are the guarantees about durability, at an application level, in
> practice? By "application level" I mean guarantees that a produced
> message gets consumed and acted upon by an application that uses
> Kafka. My understanding at present is that Kafka does not make these
> kinds of guarantees because there are no acks. So, it is up to the
> application developer to handle it. Is this right?

Yes.

>
> Here's my understanding: Having messages persisted on disk and
> replicated is why Kafka has durability guarantees. But, from an
> application perspective, what happens when a consumer pulls a message
> but fails before acting on it? That would update the Kafka consumer
> offset, right? So, without some thinking and planning ahead on the
> Kafka system design, the application's consumers would not have a way
> of knowing that a message was not actually processed.

Don't update the offset until message has been fully processed. This
means you need to build downstream systems that can accept messages
being replayed, since a message may be processed but the consumer
crash before the offset it updated. Or at least have a process in
place to deal with clean-up, in the event of a crash.

>
> Conclusion / Last Question
>
> I'm interested in making the chance of message loss minimal, at a
> system level. Any pointers on what to read or think about would be
> appreciated!
>
> Thanks!
> -David

 
+
Jay Kreps 2013-07-07, 21:36
+
Florin Trofin 2013-07-13, 01:39
+
Eric Sites 2013-07-13, 02:58