Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Understanding how producers and consumers behave in case of node failures in 0.8


+
Aniket Bhatnagar 2013-10-24, 10:09
Copy link to this message
-
Re: Understanding how producers and consumers behave in case of node failures in 0.8
Yes. And during retries, the producer and consumer refetch metadata.

Thanks,
Neha
On Oct 24, 2013 3:09 AM, "Aniket Bhatnagar" <[EMAIL PROTECTED]>
wrote:

> I am trying to understand and document how producers & consumers
> will/should behave in case of node failures in 0.8. I know there are
> various other threads that discuss this but I wanted to bring all the
> information together in one post. This should help people building
> producers & consumers in other languages as well. Here is my understanding
> of how Kafak behaves in failures:
>
> Case 1: If a node fails that wasn't a leader for any partitions
> No impact on consumers and producers
>
> Case 2: If a leader node fails but another in sync node can be become a
> leader
> All publishing to and consumption from the partition whose leader failed
> will momentarily stop until a new leader is elected. Producers should
> implement retry logic in such cases (and in fact in all kinds of errors
> from Kafka) and consumers can (depending on your use case) either continue
> to other partitions after retrying decent number of times (in case you are
> fetching from partitions in round robin fashion) or keep retrying until
> leader is available.
>
> Case 3: If a leader node goes down and no other in sync nodes are available
> In this case, publishing to and consumption from the partition will halt
> and will not resume until the faulty leader node recovers. In this case,
> producers should fail the publish request after retrying decent number of
> times and provide a callback to the client of the producer to take
> corrective action. Consumers again have a choice to continue to other
> partitions after retrying decent number of times (in case you are fetching
> from partitions in round robin fashion) or keep retrying until leader is
> available. In case of latter, the entire consumer process will halt until
> the faulty node recovers.
>
> Do I have this right?
>

 
+
Aniket Bhatnagar 2013-10-24, 13:25
+
Kane Kane 2013-10-24, 19:00
+
Neha Narkhede 2013-10-24, 21:08
+
Chris Bedford 2013-10-24, 21:28
+
Aniket Bhatnagar 2013-10-25, 01:40