Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - async producer behavior if zk and/or kafka cluster goes away...


Copy link to this message
-
Re: async producer behavior if zk and/or kafka cluster goes away...
Jun Rao 2012-11-20, 18:39
That's right. VIP is only used for getting metadata. All producer send
requests are through direct RPC to each broker.

Thanks,

Jun

On Tue, Nov 20, 2012 at 10:28 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Ok,
>
> I think I understand (so I'll need to change some things in our set up to
> work with 0.8).
>
> So the VIP is only for getting meta-data?  After that, under the covers,
> the producers will make direct connections to individual kafka hosts that
> they learned about from connecting through the VIP?
>
> Jason
>
> On Tue, Nov 20, 2012 at 10:20 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > I think the confusion is that we are answering a slightly different
> > question then what you are asking. If I understand you are asking, "do I
> > need to put ALL the kafka broker urls into the config for the client and
> > will this need to be updated if I add machines to the cluster?".
> >
> > The answer to both these questions is no. The broker list configuration
> > will work exactly as your zookeeper configuration worked. Namely you must
> > have the URL of at least one operational broker in the cluster, and the
> > producer will use this/these urls to fetch a complete topology of the
> > cluster (all nodes, and what partitions they have). If you add kafka
> > brokers or migrate partitions from one broker to another clients will
> > automatically discover this and adjust appropriately with no need for
> > config changes. The brokerlist you give is only used when fetching
> > metadata, all producer requests go directly to the appropriate broker.
> As a
> > result you can use a VIP for the broker list if you like, without having
> > any of the actual data you send go through that VIP.
> >
> > As Neha and Jun mentioned there were a couple of reasons for this change:
> > 1. If you use kafka heavily everything ends up connecting to zk and any
> > operational change to zk or upgrade because immensely difficult.
> > 2. Zk support outside java is spotty at best.
> > 3. In effect we were using zk for what it is good at--discover--because
> > discovery is asynchronous. That is if you try to send to the wrong broker
> > we need to give you an error right away and have you update your
> metadata,
> > and this will likely happen before the zk watcher fires. Plus once you
> > handle this case you don't need the watcher. As a result zk is just being
> > used as a key-value store.
> >
> > -Jay
> >
> >
> >
> > On Tue, Nov 20, 2012 at 9:44 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
> >
> > > Ok,
> > >
> > > So, I'm still wrapping my mind around this.  I liked being able to use
> zk
> > > for all clients, since it made it very easy to think about how to
> update
> > > the kafka cluster.  E.g. how to add new brokers, how to move them all
> to
> > > new hosts entirely, etc., without having to redeploy all the clients.
> >  The
> > > new brokers will simply advertise their new location via zk, and all
> > > clients will pick it up.
> > >
> > > By requiring use of a configured broker.list for each client, means
> that
> > > 1000's of deployed apps need to be updated any time the kafka cluster
> > > changes, no?  (Or am I not understanding?).
> > >
> > > You mention that auto-discovery of new brokers will still work, is that
> > > dependent on the existing configured broker.list set still being
> > available
> > > also?
> > >
> > > I can see though how this will greatly reduce the load on zookeeper.
> > >
> > > Jason
> > >
> > >
> > >
> > > On Tue, Nov 20, 2012 at 9:03 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > Jason,
> > > >
> > > > Auto discovery of new brokers and rolling restart of brokers are
> still
> > > > supported in 0.8. It's just that most of the ZK related logic is
> moved
> > to
> > > > the broker.
> > > >
> > > > There are 2 reasons why we want to remove zkclient from the client.
> > > >
> > > > 1. If the client goes to GC, it can cause zk session expiration and
> > cause
> > > > churns in the client and extra load on the zk server.