Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> async producer behavior if zk and/or kafka cluster goes away...

Copy link to this message
Re: async producer behavior if zk and/or kafka cluster goes away...
I checked out trunk.  I guess I assumed that included the latest 0.8.  Is
that not right?  Am I just looking at 0.7.x+?

Honestly, I don't think it would be a positive thing not to be able to rely
on zookeeper in producer code.  How does that affect the discovery of a
kafka cluster under dynamic conditions?  We'd expect to have a much higher
SLA for the zookeeper cluster than for kafka.  We'd like to be able to
freely do rolling restarts of the kafka cluster, etc.

Also, it seems a bit asymetric to use zk for the kafka brokers and
consumers, but not the producers.


On Mon, Nov 19, 2012 at 8:50 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> In 0.8 there is no way to use zookeeper from the producer and no connection
> from the client. There isn't even a way to configure a zk connection. Are
> you sure you checked out the 0.8 branch?
> Check the code you've got:
> *jkreps-mn:kafka-0.8 jkreps$ svn info*
> *Path: .*
> *URL: https://svn.apache.org/repos/asf/incubator/kafka/branches/0.8*
> *Repository Root: https://svn.apache.org/repos/asf*
> The key is that it should have come from the URL kafka/branches/0.8.
> -Jay
> On Mon, Nov 19, 2012 at 3:30 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
> > Regarding the poducer/zk connection:  if I am using zk to discover the
> > kafka cluster, doesn't the producer get updates if zk's knowledge of the
> > cluster changes?  Or does it only reconsult zk if the particular kafka
> node
> > it was "getting metadata" from goes away?  Should I not be using a
> > "zk.connect" but instead a "broker.list" when using a producer (that
> would
> > seem restrictive)?  What I've noticed is that the instant the zk server
> is
> > taken down, my producer immediately starts logging connection errors to
> zk,
> > every second, and never stops this logging until zk comes back.  So it
> > certainly feels like the producer is attempting to maintain a direct
> > connection to zk.  I suppose I expected it to try for the connection
> > timeout period (e.g. 6000ms), and then give up, until the next send
> > request, etc.
> >
> > Perhaps what it should do is make that initial zk connection to find the
> > kafka broker list, then shutdown the zk connection if it really doesn't
> > need it after that, until possibly recreating it if needed if it can no
> > longer make contact with the kafka cluster.
> >
> > For the async queuing behavior, I agree, it's difficult to respond to a
> > send request with an exception, when the sending is done asynchronously,
> in
> > a different thread.  However, this is the behavior when the producer is
> > started initially, with no zk available (e.g. producer.send() gets an
> > exception).  So, the api is inconsistent, in that it treats the
> > unavailability of zk differently, depending on whether it was unavailable
> > at the initial startup, vs. a subsequent zk outage after previously
> having
> > been available.
> >
> > I am not too concerned about not having 100% guarantee that if I
> > successfully call producer.send(), that I know it was actually delivered.
> >  But it would be nice to have some way to know the current health of the
> > producer, perhaps some sort of "producerStatus()" method.  If the async
> > sending thread is having issues sending, it might be nice to expose that
> to
> > the client.  Also, if the current producerStatus() is not healthy, then I
> > think it might be ok to not accept new messages to be sent (e.g.
> > producer.send() could throw an Exception in that case).
> >
> > Returning a Future for each message sent seems a bit unscalable.....I'm
> not
> > sure clients want to be tying up resources waiting for Futures all the
> time
> > either.
> >
> > I'm also seeing that if  kafka goes down, while zk stays up, subsequent
> > calls to producer.send() fail immediately with an exception ("partition
> is
> > null").  I think this makes sense, although, in that case, what is the
> fate
> > of previously buffered but unsent messages?  Are they all lost?