Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Understanding how producers and consumers behave in case of node failures in 0.8


Copy link to this message
-
Re: Understanding how producers and consumers behave in case of node failures in 0.8
>>publishing to and consumption from the partition will halt
and will not resume until the faulty leader node recovers

Can you confirm that's the case? I think they won't wait until leader
recovered and will try to elect new leader from existing non-ISR replicas?
And in case if they wait, and faulty leader never comes back?
On Thu, Oct 24, 2013 at 6:24 AM, Aniket Bhatnagar <
[EMAIL PROTECTED]> wrote:

> Thanks Neha
>
>
> On 24 October 2013 18:11, Neha Narkhede <[EMAIL PROTECTED]> wrote:
>
> > Yes. And during retries, the producer and consumer refetch metadata.
> >
> > Thanks,
> > Neha
> > On Oct 24, 2013 3:09 AM, "Aniket Bhatnagar" <[EMAIL PROTECTED]>
> > wrote:
> >
> > > I am trying to understand and document how producers & consumers
> > > will/should behave in case of node failures in 0.8. I know there are
> > > various other threads that discuss this but I wanted to bring all the
> > > information together in one post. This should help people building
> > > producers & consumers in other languages as well. Here is my
> > understanding
> > > of how Kafak behaves in failures:
> > >
> > > Case 1: If a node fails that wasn't a leader for any partitions
> > > No impact on consumers and producers
> > >
> > > Case 2: If a leader node fails but another in sync node can be become a
> > > leader
> > > All publishing to and consumption from the partition whose leader
> failed
> > > will momentarily stop until a new leader is elected. Producers should
> > > implement retry logic in such cases (and in fact in all kinds of errors
> > > from Kafka) and consumers can (depending on your use case) either
> > continue
> > > to other partitions after retrying decent number of times (in case you
> > are
> > > fetching from partitions in round robin fashion) or keep retrying until
> > > leader is available.
> > >
> > > Case 3: If a leader node goes down and no other in sync nodes are
> > available
> > > In this case, publishing to and consumption from the partition will
> halt
> > > and will not resume until the faulty leader node recovers. In this
> case,
> > > producers should fail the publish request after retrying decent number
> of
> > > times and provide a callback to the client of the producer to take
> > > corrective action. Consumers again have a choice to continue to other
> > > partitions after retrying decent number of times (in case you are
> > fetching
> > > from partitions in round robin fashion) or keep retrying until leader
> is
> > > available. In case of latter, the entire consumer process will halt
> until
> > > the faulty node recovers.
> > >
> > > Do I have this right?
> > >
> >
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB