Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> produce request failed: due to Leader not local for partition


+
Jason Rosenberg 2013-06-23, 08:45
+
Jun Rao 2013-06-24, 03:23
+
Jason Rosenberg 2013-06-24, 07:23
+
Joel Koshy 2013-06-24, 08:57
+
Jun Rao 2013-06-24, 14:50
+
Joel Koshy 2013-06-24, 15:45
+
Jason Rosenberg 2013-06-24, 21:50
+
Jun Rao 2013-06-25, 04:05
Copy link to this message
-
Re: produce request failed: due to Leader not local for partition
What about a non-controlled shutdown, and a restart, but the producer never
attempts to send anything during the time the broker was down?  That could
have caused a leader change, but without the producer knowing to refresh
it's metadata, no?
On Mon, Jun 24, 2013 at 9:05 PM, Jun Rao <[EMAIL PROTECTED]> wrote:

> Other than controlled shutdown, the only other case that can cause the
> leader to change when the underlying broker is alive is when the broker
> expires its ZK session (likely due to GC), which should be rare. That being
> said, forwarding in the broker may not be a bad idea. Could you file a jira
> to track this?
>
> Thanks,
>
> Jun
>
>
> On Mon, Jun 24, 2013 at 2:50 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
>
> > Yeah,
> >
> > I see that with ack=0, the producer will be in a bad state anytime the
> > leader for it's partition has changed, while the broker that it thinks is
> > the leader is still up.  So this is a problem in general, not only for
> > controlled shutdown, but even for the case where you've restarted a
> server
> > (without controlled shutdown), which in and of itself can force a leader
> > change.  If the producer doesn't attempt to send a message during the
> time
> > the broker was down, it will never get a connection failure, and never
> get
> > fresh metadata, and subsequently start sending messages to the
> non-leader.
> >
> > Thus, I'd say this is a problem with ack=0, regardless of controlled
> > shutdown.  Any time there's a leader change, the producer will send
> > messages into the ether.  I think this is actually a severe condition,
> that
> > could be considered a bug.  How hard would it be to have the receiving
> > broker forward on to the leader, in this case?
> >
> > Jason
> >
> >
> > On Mon, Jun 24, 2013 at 8:44 AM, Joel Koshy <[EMAIL PROTECTED]> wrote:
> >
> > > I think Jason was suggesting quiescent time as a possibility only if
> the
> > > broker did request forwarding if it is not the leader.
> > >
> > > On Monday, June 24, 2013, Jun Rao wrote:
> > >
> > > > Jason,
> > > >
> > > > The quiescence time that you proposed won't work. The reason is that
> > with
> > > > ack=0, the producer starts losing data silently from the moment the
> > > leader
> > > > is moved (by controlled shutdown) until the broker is shut down. So,
> > the
> > > > sooner that you can shut down the broker, the better. What we
> realized
> > is
> > > > that if you can use a larger batch size, ack=1 can still deliver very
> > > good
> > > > throughput.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Mon, Jun 24, 2013 at 12:22 AM, Jason Rosenberg <[EMAIL PROTECTED]
> > > <javascript:;>>
> > > > wrote:
> > > >
> > > > > Yeah I am using ack = 0, so that makes sense.  I'll need to rethink
> > > that,
> > > > > it would seem.  It would be nice, wouldn't it, in this case, for
> the
> > > > broker
> > > > > to realize this and just forward the messages to the correct
> leader.
> > > >  Would
> > > > > that be possible?
> > > > >
> > > > > Also, it would be nice to have a second option to the controlled
> > > shutdown
> > > > > (e.g. controlled.shutdown.quiescence.ms), to allow the broker to
> > wait
> > > > > after
> > > > > the controlled shutdown, a prescribed amount of time before
> actually
> > > > > shutting down the server. Then, I could set this value to
> something a
> > > > > little greater than the producer's '
> > topic.metadata.refresh.interval.ms
> > > '.
> > > > >  This would help with hitless rolling restarts too.  Currently,
> every
> > > > > producer gets a very loud "Connection Reset" with a tall stack
> trace
> > > each
> > > > > time I restart a broker.  Would be nicer to have the producers
> still
> > be
> > > > > able to produce until the metadata refresh interval expires, then
> get
> > > the
> > > > > word that the leader has moved due to the controlled shutdown, and
> > then
> > > > > start producing to the new leader, all before the shutting down
> > server
> > >
 
+
Jun Rao 2013-06-25, 05:01
+
Jason Rosenberg 2013-06-25, 05:14
+
Jason Rosenberg 2013-06-25, 05:25
+
Jason Rosenberg 2013-06-25, 05:53
+
Jason Rosenberg 2013-06-29, 13:22
+
Jun Rao 2013-07-01, 04:33
+
Jason Rosenberg 2013-06-23, 09:04