Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> produce request failed: due to Leader not local for partition

Copy link to this message
Re: produce request failed: due to Leader not local for partition
Also, looking back at my logs, I'm wondering if a producer will reuse the
same socket to send data to the same broker, for multiple topics (I'm
guessing yes).  In which case, it looks like I'm seeing this scenario:

1. producer1 is happily sending messages for topicX and topicY to serverA
(serverA is the leader for both topics, only 1 partition for each topic for
2. serverA is restarted, and in the process, serverB becomes the new leader
for both topicX and topicY.
3. producer1 decides to send a new message to topicX to serverA.
3a. this results in an exception ("Connection reset by peer").  producer1's
connection to serverA is invalidated.
3b. producer1 makes a new metadata request for topicX, and learns that
serverB is now the leader for topicX.
3c. producer1 resends the message to topicX, on serverB.
4. producer1 decides to send a new message to topicY to serverA.
4a. producer1 notes that it's socket to serverA is invalid, so it creates a
new connection to serverA.
4b. producer1 successfully sends it's message to serverA (without realizing
that serverA is no longer the leader for topicY).
4c. serverA logs to it's console:
2013-06-23 08:28:46,770  WARN [kafka-request-handler-2] server.KafkaApis -
[KafkaApi-508818741] Produce request with correlation id 7136261 from
client  on partition [mytopic,0] failed due to Leader not local for
partition [mytopic,0] on broker 508818741
5. producer1 continues to send messages for topicY to serverA, and serverA
continues to log the same messages.
6. 10 minutes later, producer1 decides to update it's metadata for topicY,
and learns that serverB is now the leader for topidY.
7. the warning messages finally stop in the console for serverA.

I am pretty sure this scenario, or one very close to it, is what I'm seeing
in my logs, after doing a rolling restart, with controlled shutdown.

Does this scenario make sense?

One thing I notice, is that in the steady state, every 10 minutes the
producer refreshes it's metadata for all topics.  However, when sending a
message to a specific topic fails, only the metadata for that topic is
refreshed, even though the ramifications should be that all topics which
have the same leader might need to be refreshed, especially in response to
a "connection reset by peer".


On Mon, Jun 24, 2013 at 10:14 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Jun,
> To be clear, this whole discussion was started, because I am clearly
> seeing "failed due to Leader not local" on the last broker restarted,
> after all the controlled shutting down has completed and all brokers
> restarted.
> This leads me to believe that a client made a meta data request and found
> out that server A was the leader for it's partition, and then server A was
> restarted, and then the client makes repeated producer requests to server
> A, without encountering a broken socket.  Thus, I'm not sure it's correct
> that the socket is invalidated in that case after a restart.
> Alternatively, could it be that the client (which sends messages to
> multiple topics), gets metadata updates for multiple topics, but doesn't
> attempt to send a message to topicX until after the leader has changed and
> server A has been restarted.  In this case, if it's the first time the
> producer sends to topicX, does it only then create a new socket?
> Jason
> On Mon, Jun 24, 2013 at 10:00 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>> That should be fine since the old socket in the producer will no longer be
>> usable after a broker is restarted.
>> Thanks,
>> Jun
>> On Mon, Jun 24, 2013 at 9:50 PM, Jason Rosenberg <[EMAIL PROTECTED]>
>> wrote:
>> > What about a non-controlled shutdown, and a restart, but the producer
>> never
>> > attempts to send anything during the time the broker was down?  That
>> could
>> > have caused a leader change, but without the producer knowing to refresh
>> > it's metadata, no?
>> >
>> >
>> > On Mon, Jun 24, 2013 at 9:05 PM, Jun Rao <[EMAIL PROTECTED]> wrote: