Also, looking back at my logs, I'm wondering if a producer will reuse the
same socket to send data to the same broker, for multiple topics (I'm
guessing yes).  In which case, it looks like I'm seeing this scenario:

1. producer1 is happily sending messages for topicX and topicY to serverA
(serverA is the leader for both topics, only 1 partition for each topic for
2. serverA is restarted, and in the process, serverB becomes the new leader
for both topicX and topicY.
3. producer1 decides to send a new message to topicX to serverA.
3a. this results in an exception ("Connection reset by peer").  producer1's
connection to serverA is invalidated.
3b. producer1 makes a new metadata request for topicX, and learns that
serverB is now the leader for topicX.
3c. producer1 resends the message to topicX, on serverB.
4. producer1 decides to send a new message to topicY to serverA.
4a. producer1 notes that it's socket to serverA is invalid, so it creates a
new connection to serverA.
4b. producer1 successfully sends it's message to serverA (without realizing
that serverA is no longer the leader for topicY).
4c. serverA logs to it's console:
2013-06-23 08:28:46,770  WARN [kafka-request-handler-2] server.KafkaApis -
[KafkaApi-508818741] Produce request with correlation id 7136261 from
client  on partition [mytopic,0] failed due to Leader not local for
partition [mytopic,0] on broker 508818741
5. producer1 continues to send messages for topicY to serverA, and serverA
continues to log the same messages.
6. 10 minutes later, producer1 decides to update it's metadata for topicY,
and learns that serverB is now the leader for topidY.
7. the warning messages finally stop in the console for serverA.

I am pretty sure this scenario, or one very close to it, is what I'm seeing
in my logs, after doing a rolling restart, with controlled shutdown.

Does this scenario make sense?

One thing I notice, is that in the steady state, every 10 minutes the
producer refreshes it's metadata for all topics.  However, when sending a
message to a specific topic fails, only the metadata for that topic is
refreshed, even though the ramifications should be that all topics which
have the same leader might need to be refreshed, especially in response to
a "connection reset by peer".


On Mon, Jun 24, 2013 at 10:14 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB