Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - produce request failed: due to Leader not local for partition


+
Jason Rosenberg 2013-06-23, 08:45
+
Jun Rao 2013-06-24, 03:23
+
Jason Rosenberg 2013-06-24, 07:23
+
Joel Koshy 2013-06-24, 08:57
+
Jun Rao 2013-06-24, 14:50
+
Joel Koshy 2013-06-24, 15:45
+
Jason Rosenberg 2013-06-24, 21:50
+
Jun Rao 2013-06-25, 04:05
+
Jason Rosenberg 2013-06-25, 04:50
Copy link to this message
-
Re: produce request failed: due to Leader not local for partition
Jun Rao 2013-06-25, 05:01
That should be fine since the old socket in the producer will no longer be
usable after a broker is restarted.

Thanks,

Jun
On Mon, Jun 24, 2013 at 9:50 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> What about a non-controlled shutdown, and a restart, but the producer never
> attempts to send anything during the time the broker was down?  That could
> have caused a leader change, but without the producer knowing to refresh
> it's metadata, no?
>
>
> On Mon, Jun 24, 2013 at 9:05 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Other than controlled shutdown, the only other case that can cause the
> > leader to change when the underlying broker is alive is when the broker
> > expires its ZK session (likely due to GC), which should be rare. That
> being
> > said, forwarding in the broker may not be a bad idea. Could you file a
> jira
> > to track this?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Jun 24, 2013 at 2:50 PM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
> >
> > > Yeah,
> > >
> > > I see that with ack=0, the producer will be in a bad state anytime the
> > > leader for it's partition has changed, while the broker that it thinks
> is
> > > the leader is still up.  So this is a problem in general, not only for
> > > controlled shutdown, but even for the case where you've restarted a
> > server
> > > (without controlled shutdown), which in and of itself can force a
> leader
> > > change.  If the producer doesn't attempt to send a message during the
> > time
> > > the broker was down, it will never get a connection failure, and never
> > get
> > > fresh metadata, and subsequently start sending messages to the
> > non-leader.
> > >
> > > Thus, I'd say this is a problem with ack=0, regardless of controlled
> > > shutdown.  Any time there's a leader change, the producer will send
> > > messages into the ether.  I think this is actually a severe condition,
> > that
> > > could be considered a bug.  How hard would it be to have the receiving
> > > broker forward on to the leader, in this case?
> > >
> > > Jason
> > >
> > >
> > > On Mon, Jun 24, 2013 at 8:44 AM, Joel Koshy <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > I think Jason was suggesting quiescent time as a possibility only if
> > the
> > > > broker did request forwarding if it is not the leader.
> > > >
> > > > On Monday, June 24, 2013, Jun Rao wrote:
> > > >
> > > > > Jason,
> > > > >
> > > > > The quiescence time that you proposed won't work. The reason is
> that
> > > with
> > > > > ack=0, the producer starts losing data silently from the moment the
> > > > leader
> > > > > is moved (by controlled shutdown) until the broker is shut down.
> So,
> > > the
> > > > > sooner that you can shut down the broker, the better. What we
> > realized
> > > is
> > > > > that if you can use a larger batch size, ack=1 can still deliver
> very
> > > > good
> > > > > throughput.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Mon, Jun 24, 2013 at 12:22 AM, Jason Rosenberg <
> [EMAIL PROTECTED]
> > > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > Yeah I am using ack = 0, so that makes sense.  I'll need to
> rethink
> > > > that,
> > > > > > it would seem.  It would be nice, wouldn't it, in this case, for
> > the
> > > > > broker
> > > > > > to realize this and just forward the messages to the correct
> > leader.
> > > > >  Would
> > > > > > that be possible?
> > > > > >
> > > > > > Also, it would be nice to have a second option to the controlled
> > > > shutdown
> > > > > > (e.g. controlled.shutdown.quiescence.ms), to allow the broker to
> > > wait
> > > > > > after
> > > > > > the controlled shutdown, a prescribed amount of time before
> > actually
> > > > > > shutting down the server. Then, I could set this value to
> > something a
> > > > > > little greater than the producer's '
> > > topic.metadata.refresh.interval.ms
> > > > '.
> > > > > >  This would help with hitless rolling restarts too.  Currently,
> > every
> > > > > > producer gets a very loud "Connection Reset" with a tall stack

 
+
Jason Rosenberg 2013-06-25, 05:14
+
Jason Rosenberg 2013-06-25, 05:25
+
Jason Rosenberg 2013-06-25, 05:53
+
Jason Rosenberg 2013-06-29, 13:22
+
Jun Rao 2013-07-01, 04:33
+
Jason Rosenberg 2013-06-23, 09:04