Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> produce request failed: due to Leader not local for partition


+
Jason Rosenberg 2013-06-23, 08:45
+
Jun Rao 2013-06-24, 03:23
+
Jason Rosenberg 2013-06-24, 07:23
+
Joel Koshy 2013-06-24, 08:57
+
Jun Rao 2013-06-24, 14:50
+
Joel Koshy 2013-06-24, 15:45
Copy link to this message
-
Re: produce request failed: due to Leader not local for partition
Yeah,

I see that with ack=0, the producer will be in a bad state anytime the
leader for it's partition has changed, while the broker that it thinks is
the leader is still up.  So this is a problem in general, not only for
controlled shutdown, but even for the case where you've restarted a server
(without controlled shutdown), which in and of itself can force a leader
change.  If the producer doesn't attempt to send a message during the time
the broker was down, it will never get a connection failure, and never get
fresh metadata, and subsequently start sending messages to the non-leader.

Thus, I'd say this is a problem with ack=0, regardless of controlled
shutdown.  Any time there's a leader change, the producer will send
messages into the ether.  I think this is actually a severe condition, that
could be considered a bug.  How hard would it be to have the receiving
broker forward on to the leader, in this case?

Jason
On Mon, Jun 24, 2013 at 8:44 AM, Joel Koshy <[EMAIL PROTECTED]> wrote:

> I think Jason was suggesting quiescent time as a possibility only if the
> broker did request forwarding if it is not the leader.
>
> On Monday, June 24, 2013, Jun Rao wrote:
>
> > Jason,
> >
> > The quiescence time that you proposed won't work. The reason is that with
> > ack=0, the producer starts losing data silently from the moment the
> leader
> > is moved (by controlled shutdown) until the broker is shut down. So, the
> > sooner that you can shut down the broker, the better. What we realized is
> > that if you can use a larger batch size, ack=1 can still deliver very
> good
> > throughput.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Jun 24, 2013 at 12:22 AM, Jason Rosenberg <[EMAIL PROTECTED]
> <javascript:;>>
> > wrote:
> >
> > > Yeah I am using ack = 0, so that makes sense.  I'll need to rethink
> that,
> > > it would seem.  It would be nice, wouldn't it, in this case, for the
> > broker
> > > to realize this and just forward the messages to the correct leader.
> >  Would
> > > that be possible?
> > >
> > > Also, it would be nice to have a second option to the controlled
> shutdown
> > > (e.g. controlled.shutdown.quiescence.ms), to allow the broker to wait
> > > after
> > > the controlled shutdown, a prescribed amount of time before actually
> > > shutting down the server. Then, I could set this value to something a
> > > little greater than the producer's 'topic.metadata.refresh.interval.ms
> '.
> > >  This would help with hitless rolling restarts too.  Currently, every
> > > producer gets a very loud "Connection Reset" with a tall stack trace
> each
> > > time I restart a broker.  Would be nicer to have the producers still be
> > > able to produce until the metadata refresh interval expires, then get
> the
> > > word that the leader has moved due to the controlled shutdown, and then
> > > start producing to the new leader, all before the shutting down server
> > > actually shuts down.  Does that seem feasible?
> > >
> > > Jason
> > >
> > >
> > > On Sun, Jun 23, 2013 at 8:23 PM, Jun Rao <[EMAIL PROTECTED]
> <javascript:;>>
> > wrote:
> > >
> > > > Jason,
> > > >
> > > > Are you using ack = 0 in the producer? This mode doesn't work well
> with
> > > > controlled shutdown (this is explained in FAQ i*n
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#)*
> > > > *
> > > > *
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Sun, Jun 23, 2013 at 1:45 AM, Jason Rosenberg <[EMAIL PROTECTED]
> <javascript:;>
> > >
> > > wrote:
> > > >
> > > > > I'm working on trying on having seamless rolling restarts for my
> > kafka
> > > > > servers, running 0.8.  I have it so that each server will be
> > restarted
> > > > > sequentially.  Each server takes itself out of the load balancer
> > (e.g.
> > > > sets
> > > > > a status that the lb will recognize, and then waits more than long
> > > enough
> > > > > for the lb to stop sending meta-data requests to that server).
>  Then
> > I
> > > > > initiate the shutdown (with controlled.shutdown.enable=true).  This

 
+
Jun Rao 2013-06-25, 04:05
+
Jason Rosenberg 2013-06-25, 04:50
+
Jun Rao 2013-06-25, 05:01
+
Jason Rosenberg 2013-06-25, 05:14
+
Jason Rosenberg 2013-06-25, 05:25
+
Jason Rosenberg 2013-06-25, 05:53
+
Jason Rosenberg 2013-06-29, 13:22
+
Jun Rao 2013-07-01, 04:33
+
Jason Rosenberg 2013-06-23, 09:04