Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> async producer behavior if zk and/or kafka cluster goes away...

Copy link to this message
Re: async producer behavior if zk and/or kafka cluster goes away...
I think the confusion is that we are answering a slightly different
question then what you are asking. If I understand you are asking, "do I
need to put ALL the kafka broker urls into the config for the client and
will this need to be updated if I add machines to the cluster?".

The answer to both these questions is no. The broker list configuration
will work exactly as your zookeeper configuration worked. Namely you must
have the URL of at least one operational broker in the cluster, and the
producer will use this/these urls to fetch a complete topology of the
cluster (all nodes, and what partitions they have). If you add kafka
brokers or migrate partitions from one broker to another clients will
automatically discover this and adjust appropriately with no need for
config changes. The brokerlist you give is only used when fetching
metadata, all producer requests go directly to the appropriate broker. As a
result you can use a VIP for the broker list if you like, without having
any of the actual data you send go through that VIP.

As Neha and Jun mentioned there were a couple of reasons for this change:
1. If you use kafka heavily everything ends up connecting to zk and any
operational change to zk or upgrade because immensely difficult.
2. Zk support outside java is spotty at best.
3. In effect we were using zk for what it is good at--discover--because
discovery is asynchronous. That is if you try to send to the wrong broker
we need to give you an error right away and have you update your metadata,
and this will likely happen before the zk watcher fires. Plus once you
handle this case you don't need the watcher. As a result zk is just being
used as a key-value store.


On Tue, Nov 20, 2012 at 9:44 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Ok,
> So, I'm still wrapping my mind around this.  I liked being able to use zk
> for all clients, since it made it very easy to think about how to update
> the kafka cluster.  E.g. how to add new brokers, how to move them all to
> new hosts entirely, etc., without having to redeploy all the clients.  The
> new brokers will simply advertise their new location via zk, and all
> clients will pick it up.
> By requiring use of a configured broker.list for each client, means that
> 1000's of deployed apps need to be updated any time the kafka cluster
> changes, no?  (Or am I not understanding?).
> You mention that auto-discovery of new brokers will still work, is that
> dependent on the existing configured broker.list set still being available
> also?
> I can see though how this will greatly reduce the load on zookeeper.
> Jason
> On Tue, Nov 20, 2012 at 9:03 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > Jason,
> >
> > Auto discovery of new brokers and rolling restart of brokers are still
> > supported in 0.8. It's just that most of the ZK related logic is moved to
> > the broker.
> >
> > There are 2 reasons why we want to remove zkclient from the client.
> >
> > 1. If the client goes to GC, it can cause zk session expiration and cause
> > churns in the client and extra load on the zk server.
> > 2. This simplifies the client code and makes the implementation of
> non-java
> > clients easier.
> >
> > In 0.8, we removed the zk dependency from the producer. Post 0.8, we plan
> > to see if we can do the same thing for the consumer (though more
> involved).
> > This shouldn't reduce any existing functionality of the client though.
> Feel
> > free to let us know if you still have concerns.
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> > On Tue, Nov 20, 2012 at 7:57 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
> >
> > > I checked out trunk.  I guess I assumed that included the latest 0.8.
>  Is
> > > that not right?  Am I just looking at 0.7.x+?
> > >
> > > Honestly, I don't think it would be a positive thing not to be able to
> > rely
> > > on zookeeper in producer code.  How does that affect the discovery of a
> > > kafka cluster under dynamic conditions?  We'd expect to have a much