Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> produce request failed: due to Leader not local for partition


+
Jason Rosenberg 2013-06-23, 08:45
+
Jun Rao 2013-06-24, 03:23
+
Jason Rosenberg 2013-06-24, 07:23
+
Joel Koshy 2013-06-24, 08:57
+
Jun Rao 2013-06-24, 14:50
+
Joel Koshy 2013-06-24, 15:45
+
Jason Rosenberg 2013-06-24, 21:50
+
Jun Rao 2013-06-25, 04:05
+
Jason Rosenberg 2013-06-25, 04:50
+
Jun Rao 2013-06-25, 05:01
+
Jason Rosenberg 2013-06-25, 05:14
+
Jason Rosenberg 2013-06-25, 05:25
+
Jason Rosenberg 2013-06-25, 05:53
Copy link to this message
-
Re: produce request failed: due to Leader not local for partition
I added this scenario to KAFKA-955.

I'm thinking that this scenario could be a problem for ack=0 in general
(even without controlled shutdown).  If we do an "uncontrolled" shutdown,
it seems that some topics won't ever know there could have been a leader
change.  Would it make sense to force a meta-data refresh for all topics on
a broker, any time an IOException happens on a socket (e.g. "connection
reset")?  Currently, it looks like only the topic that experiences the
failure will have a metadata refresh issued for it.

Maybe this should be a separate jira issue, now that I think about it.

Jason
On Mon, Jun 24, 2013 at 10:52 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Also, looking back at my logs, I'm wondering if a producer will reuse the
> same socket to send data to the same broker, for multiple topics (I'm
> guessing yes).  In which case, it looks like I'm seeing this scenario:
>
> 1. producer1 is happily sending messages for topicX and topicY to serverA
> (serverA is the leader for both topics, only 1 partition for each topic for
> simplicity).
> 2. serverA is restarted, and in the process, serverB becomes the new
> leader for both topicX and topicY.
> 3. producer1 decides to send a new message to topicX to serverA.
> 3a. this results in an exception ("Connection reset by peer").
>  producer1's connection to serverA is invalidated.
> 3b. producer1 makes a new metadata request for topicX, and learns that
> serverB is now the leader for topicX.
> 3c. producer1 resends the message to topicX, on serverB.
> 4. producer1 decides to send a new message to topicY to serverA.
> 4a. producer1 notes that it's socket to serverA is invalid, so it creates
> a new connection to serverA.
> 4b. producer1 successfully sends it's message to serverA (without
> realizing that serverA is no longer the leader for topicY).
> 4c. serverA logs to it's console:
> 2013-06-23 08:28:46,770  WARN [kafka-request-handler-2] server.KafkaApis -
> [KafkaApi-508818741] Produce request with correlation id 7136261 from
> client  on partition [mytopic,0] failed due to Leader not local for
> partition [mytopic,0] on broker 508818741
> 5. producer1 continues to send messages for topicY to serverA, and serverA
> continues to log the same messages.
> 6. 10 minutes later, producer1 decides to update it's metadata for topicY,
> and learns that serverB is now the leader for topidY.
> 7. the warning messages finally stop in the console for serverA.
>
> I am pretty sure this scenario, or one very close to it, is what I'm
> seeing in my logs, after doing a rolling restart, with controlled shutdown.
>
> Does this scenario make sense?
>
> One thing I notice, is that in the steady state, every 10 minutes the
> producer refreshes it's metadata for all topics.  However, when sending a
> message to a specific topic fails, only the metadata for that topic is
> refreshed, even though the ramifications should be that all topics which
> have the same leader might need to be refreshed, especially in response to
> a "connection reset by peer".
>
> Jason
>
>
>
> On Mon, Jun 24, 2013 at 10:14 PM, Jason Rosenberg <[EMAIL PROTECTED]>wrote:
>
>> Jun,
>>
>> To be clear, this whole discussion was started, because I am clearly
>> seeing "failed due to Leader not local" on the last broker restarted,
>> after all the controlled shutting down has completed and all brokers
>> restarted.
>>
>> This leads me to believe that a client made a meta data request and found
>> out that server A was the leader for it's partition, and then server A was
>> restarted, and then the client makes repeated producer requests to server
>> A, without encountering a broken socket.  Thus, I'm not sure it's correct
>> that the socket is invalidated in that case after a restart.
>>
>> Alternatively, could it be that the client (which sends messages to
>> multiple topics), gets metadata updates for multiple topics, but doesn't
>> attempt to send a message to topicX until after the leader has changed and
>> server A has been restarted.  In this case, if it's the first time the

 
+
Jun Rao 2013-07-01, 04:33
+
Jason Rosenberg 2013-06-23, 09:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB