Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - async producer behavior if zk and/or kafka cluster goes away...


Copy link to this message
-
Re: async producer behavior if zk and/or kafka cluster goes away...
Jun Rao 2012-11-20, 17:03
Jason,

Auto discovery of new brokers and rolling restart of brokers are still
supported in 0.8. It's just that most of the ZK related logic is moved to
the broker.

There are 2 reasons why we want to remove zkclient from the client.

1. If the client goes to GC, it can cause zk session expiration and cause
churns in the client and extra load on the zk server.
2. This simplifies the client code and makes the implementation of non-java
clients easier.

In 0.8, we removed the zk dependency from the producer. Post 0.8, we plan
to see if we can do the same thing for the consumer (though more involved).
This shouldn't reduce any existing functionality of the client though. Feel
free to let us know if you still have concerns.

Thanks,

Jun

On Tue, Nov 20, 2012 at 7:57 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> I checked out trunk.  I guess I assumed that included the latest 0.8.  Is
> that not right?  Am I just looking at 0.7.x+?
>
> Honestly, I don't think it would be a positive thing not to be able to rely
> on zookeeper in producer code.  How does that affect the discovery of a
> kafka cluster under dynamic conditions?  We'd expect to have a much higher
> SLA for the zookeeper cluster than for kafka.  We'd like to be able to
> freely do rolling restarts of the kafka cluster, etc.
>
> Also, it seems a bit asymetric to use zk for the kafka brokers and
> consumers, but not the producers.
>
> Jason
>
> On Mon, Nov 19, 2012 at 8:50 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > In 0.8 there is no way to use zookeeper from the producer and no
> connection
> > from the client. There isn't even a way to configure a zk connection. Are
> > you sure you checked out the 0.8 branch?
> >
> > Check the code you've got:
> > *jkreps-mn:kafka-0.8 jkreps$ svn info*
> > *Path: .*
> > *URL: https://svn.apache.org/repos/asf/incubator/kafka/branches/0.8*
> > *Repository Root: https://svn.apache.org/repos/asf*
> >
> > The key is that it should have come from the URL kafka/branches/0.8.
> >
> > -Jay
> >
> >
> > On Mon, Nov 19, 2012 at 3:30 PM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
> >
> > > Regarding the poducer/zk connection:  if I am using zk to discover the
> > > kafka cluster, doesn't the producer get updates if zk's knowledge of
> the
> > > cluster changes?  Or does it only reconsult zk if the particular kafka
> > node
> > > it was "getting metadata" from goes away?  Should I not be using a
> > > "zk.connect" but instead a "broker.list" when using a producer (that
> > would
> > > seem restrictive)?  What I've noticed is that the instant the zk server
> > is
> > > taken down, my producer immediately starts logging connection errors to
> > zk,
> > > every second, and never stops this logging until zk comes back.  So it
> > > certainly feels like the producer is attempting to maintain a direct
> > > connection to zk.  I suppose I expected it to try for the connection
> > > timeout period (e.g. 6000ms), and then give up, until the next send
> > > request, etc.
> > >
> > > Perhaps what it should do is make that initial zk connection to find
> the
> > > kafka broker list, then shutdown the zk connection if it really doesn't
> > > need it after that, until possibly recreating it if needed if it can no
> > > longer make contact with the kafka cluster.
> > >
> > > For the async queuing behavior, I agree, it's difficult to respond to a
> > > send request with an exception, when the sending is done
> asynchronously,
> > in
> > > a different thread.  However, this is the behavior when the producer is
> > > started initially, with no zk available (e.g. producer.send() gets an
> > > exception).  So, the api is inconsistent, in that it treats the
> > > unavailability of zk differently, depending on whether it was
> unavailable
> > > at the initial startup, vs. a subsequent zk outage after previously
> > having
> > > been available.
> > >
> > > I am not too concerned about not having 100% guarantee that if I
> > > successfully call producer.send(), that I know it was actually