Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> produce request failed: due to Leader not local for partition

Jason Rosenberg 2013-06-23, 08:45
Jun Rao 2013-06-24, 03:23
Jason Rosenberg 2013-06-24, 07:23
Joel Koshy 2013-06-24, 08:57
Jun Rao 2013-06-24, 14:50
Joel Koshy 2013-06-24, 15:45
Jason Rosenberg 2013-06-24, 21:50
Jun Rao 2013-06-25, 04:05
Jason Rosenberg 2013-06-25, 04:50
Jun Rao 2013-06-25, 05:01
Jason Rosenberg 2013-06-25, 05:14
Jason Rosenberg 2013-06-25, 05:25
Jason Rosenberg 2013-06-25, 05:53
Copy link to this message
Re: produce request failed: due to Leader not local for partition
I added this scenario to KAFKA-955.

I'm thinking that this scenario could be a problem for ack=0 in general
(even without controlled shutdown).  If we do an "uncontrolled" shutdown,
it seems that some topics won't ever know there could have been a leader
change.  Would it make sense to force a meta-data refresh for all topics on
a broker, any time an IOException happens on a socket (e.g. "connection
reset")?  Currently, it looks like only the topic that experiences the
failure will have a metadata refresh issued for it.

Maybe this should be a separate jira issue, now that I think about it.

On Mon, Jun 24, 2013 at 10:52 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Also, looking back at my logs, I'm wondering if a producer will reuse the
> same socket to send data to the same broker, for multiple topics (I'm
> guessing yes).  In which case, it looks like I'm seeing this scenario:
> 1. producer1 is happily sending messages for topicX and topicY to serverA
> (serverA is the leader for both topics, only 1 partition for each topic for
> simplicity).
> 2. serverA is restarted, and in the process, serverB becomes the new
> leader for both topicX and topicY.
> 3. producer1 decides to send a new message to topicX to serverA.
> 3a. this results in an exception ("Connection reset by peer").
>  producer1's connection to serverA is invalidated.
> 3b. producer1 makes a new metadata request for topicX, and learns that
> serverB is now the leader for topicX.
> 3c. producer1 resends the message to topicX, on serverB.
> 4. producer1 decides to send a new message to topicY to serverA.
> 4a. producer1 notes that it's socket to serverA is invalid, so it creates
> a new connection to serverA.
> 4b. producer1 successfully sends it's message to serverA (without
> realizing that serverA is no longer the leader for topicY).
> 4c. serverA logs to it's console:
> 2013-06-23 08:28:46,770  WARN [kafka-request-handler-2] server.KafkaApis -
> [KafkaApi-508818741] Produce request with correlation id 7136261 from
> client  on partition [mytopic,0] failed due to Leader not local for
> partition [mytopic,0] on broker 508818741
> 5. producer1 continues to send messages for topicY to serverA, and serverA
> continues to log the same messages.
> 6. 10 minutes later, producer1 decides to update it's metadata for topicY,
> and learns that serverB is now the leader for topidY.
> 7. the warning messages finally stop in the console for serverA.
> I am pretty sure this scenario, or one very close to it, is what I'm
> seeing in my logs, after doing a rolling restart, with controlled shutdown.
> Does this scenario make sense?
> One thing I notice, is that in the steady state, every 10 minutes the
> producer refreshes it's metadata for all topics.  However, when sending a
> message to a specific topic fails, only the metadata for that topic is
> refreshed, even though the ramifications should be that all topics which
> have the same leader might need to be refreshed, especially in response to
> a "connection reset by peer".
> Jason
> On Mon, Jun 24, 2013 at 10:14 PM, Jason Rosenberg <[EMAIL PROTECTED]>wrote:
>> Jun,
>> To be clear, this whole discussion was started, because I am clearly
>> seeing "failed due to Leader not local" on the last broker restarted,
>> after all the controlled shutting down has completed and all brokers
>> restarted.
>> This leads me to believe that a client made a meta data request and found
>> out that server A was the leader for it's partition, and then server A was
>> restarted, and then the client makes repeated producer requests to server
>> A, without encountering a broken socket.  Thus, I'm not sure it's correct
>> that the socket is invalidated in that case after a restart.
>> Alternatively, could it be that the client (which sends messages to
>> multiple topics), gets metadata updates for multiple topics, but doesn't
>> attempt to send a message to topicX until after the leader has changed and
>> server A has been restarted.  In this case, if it's the first time the

Jun Rao 2013-07-01, 04:33
Jason Rosenberg 2013-06-23, 09:04