0.8.0 HEAD from 3/4/2013.
As I think through building a robust SimpleConsumer I ran some failure
tests today and want to make sure I understand what is going on.
FYI I know that I should be doing a metadata lookup to find the leader, but
I wanted to see what happens if things are going well and the leader
changes between requests or I've cached the leader and try to connect
without the cost of a leader lookup.
First test: connect to a Broker that is a 'copy' of the topic/partition but
not leader. Get an error '5' which maps to
Why didn't I get ErrorMapping.NotLeaderForPartitionCode or something else
to tell me I'm not talking to the Leader? 'not available' implies something
is wrong with replication. But connecting to the leader Broker everything
Second test: connect to a Broker that isn't the leader or a copy and I get
error 3, unknown topic or partition. Makes sense.
Third test: connect to the leader and while reading data, shutdown the
leader Broker via command line: I get some IOExceptions then Connection
Refused on the reconnect. (Note that the Connect Refused is the exception
raised, IOException was written to logs but not raised to my code.)
Not sure the best way to code to recover from this without assuming the
worst every time Could there be some notice from Kafka that the connection
to the leader was closed due to a shutdown vs. getting Connection Refused
errors so I can respond differently? Something like 'Broker has closed
connection due to shutdown'. So I know to sleep for a second before going
through the leader lookup logic again? Or ideally have Kafka know it was a
clean shutdown and automatically transition to the new leader.
Knowing it was a clean shutdown would also allow me to treat the clean
shutdown as a normal occurrence vs. an exception when something goes wrong.
Neha Narkhede 2013-03-05, 16:43