Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - async producer behavior if zk and/or kafka cluster goes away...


Copy link to this message
-
Re: async producer behavior if zk and/or kafka cluster goes away...
Jason Rosenberg 2012-11-19, 23:30
Regarding the poducer/zk connection:  if I am using zk to discover the
kafka cluster, doesn't the producer get updates if zk's knowledge of the
cluster changes?  Or does it only reconsult zk if the particular kafka node
it was "getting metadata" from goes away?  Should I not be using a
"zk.connect" but instead a "broker.list" when using a producer (that would
seem restrictive)?  What I've noticed is that the instant the zk server is
taken down, my producer immediately starts logging connection errors to zk,
every second, and never stops this logging until zk comes back.  So it
certainly feels like the producer is attempting to maintain a direct
connection to zk.  I suppose I expected it to try for the connection
timeout period (e.g. 6000ms), and then give up, until the next send
request, etc.

Perhaps what it should do is make that initial zk connection to find the
kafka broker list, then shutdown the zk connection if it really doesn't
need it after that, until possibly recreating it if needed if it can no
longer make contact with the kafka cluster.

For the async queuing behavior, I agree, it's difficult to respond to a
send request with an exception, when the sending is done asynchronously, in
a different thread.  However, this is the behavior when the producer is
started initially, with no zk available (e.g. producer.send() gets an
exception).  So, the api is inconsistent, in that it treats the
unavailability of zk differently, depending on whether it was unavailable
at the initial startup, vs. a subsequent zk outage after previously having
been available.

I am not too concerned about not having 100% guarantee that if I
successfully call producer.send(), that I know it was actually delivered.
 But it would be nice to have some way to know the current health of the
producer, perhaps some sort of "producerStatus()" method.  If the async
sending thread is having issues sending, it might be nice to expose that to
the client.  Also, if the current producerStatus() is not healthy, then I
think it might be ok to not accept new messages to be sent (e.g.
producer.send() could throw an Exception in that case).

Returning a Future for each message sent seems a bit unscalable.....I'm not
sure clients want to be tying up resources waiting for Futures all the time
either.

I'm also seeing that if  kafka goes down, while zk stays up, subsequent
calls to producer.send() fail immediately with an exception ("partition is
null").  I think this makes sense, although, in that case, what is the fate
of previously buffered but unsent messages?  Are they all lost?

But I'd like it if zk goes down, and then kafka goes down, it would behave
the same way as if only kafka went down.  Instead, it continues happily
buffering messages, with lots of zk connection errors logged, but no way
for the client code to know that things are not hunky dory.

In summary:

1. If zk connection is not important for a producer, why continually log zk
connection errors every second, while at the same time have the client
behave as if nothing is wrong and just keep accepting messages.
2. if zk connection goes down, followed by kafka going down, it behaves no
differently than if only zk went down, from the client's perspective (it
keeps accepting new messages).
3. if zk connection stays up, but kafka goes down, it then fails
immediately with an exception in a call to producer.send (I think this
makes sense).
4. we have no way of knowing if/when buffered messages are sent, once zk
and/or kafka come back online (although it appears all buffered messages
are lost in any case where kafka goes offline).
5. I'm not clear on the difference between "queue.time" and "queue.size".
 If kafka is not available, but "queue.time" has expired, what happens, do
messages get dropped, or do they continue to be buffered until queue.size
is exhausted?
6. What happens if I call producer.close(), while there are buffered
messages.  Do the messages get sent before the producer shuts down?

Jason
On Mon, Nov 19, 2012 at 2:31 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
I think the intended behavior for the async producer is the following: