Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - async producer behavior if zk and/or kafka cluster goes away...


Copy link to this message
-
Re: async producer behavior if zk and/or kafka cluster goes away...
Neha Narkhede 2012-11-20, 18:35
>> So the VIP is only for getting meta-data?  After that, under the covers,
the producers will make direct connections to individual kafka hosts that
they learned about from connecting through the VIP

That's right.

Thanks for your questions !

On Tue, Nov 20, 2012 at 10:28 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
> Ok,
>
> I think I understand (so I'll need to change some things in our set up to
> work with 0.8).
>
> So the VIP is only for getting meta-data?  After that, under the covers,
> the producers will make direct connections to individual kafka hosts that
> they learned about from connecting through the VIP?
>
> Jason
>
> On Tue, Nov 20, 2012 at 10:20 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
>> I think the confusion is that we are answering a slightly different
>> question then what you are asking. If I understand you are asking, "do I
>> need to put ALL the kafka broker urls into the config for the client and
>> will this need to be updated if I add machines to the cluster?".
>>
>> The answer to both these questions is no. The broker list configuration
>> will work exactly as your zookeeper configuration worked. Namely you must
>> have the URL of at least one operational broker in the cluster, and the
>> producer will use this/these urls to fetch a complete topology of the
>> cluster (all nodes, and what partitions they have). If you add kafka
>> brokers or migrate partitions from one broker to another clients will
>> automatically discover this and adjust appropriately with no need for
>> config changes. The brokerlist you give is only used when fetching
>> metadata, all producer requests go directly to the appropriate broker. As a
>> result you can use a VIP for the broker list if you like, without having
>> any of the actual data you send go through that VIP.
>>
>> As Neha and Jun mentioned there were a couple of reasons for this change:
>> 1. If you use kafka heavily everything ends up connecting to zk and any
>> operational change to zk or upgrade because immensely difficult.
>> 2. Zk support outside java is spotty at best.
>> 3. In effect we were using zk for what it is good at--discover--because
>> discovery is asynchronous. That is if you try to send to the wrong broker
>> we need to give you an error right away and have you update your metadata,
>> and this will likely happen before the zk watcher fires. Plus once you
>> handle this case you don't need the watcher. As a result zk is just being
>> used as a key-value store.
>>
>> -Jay
>>
>>
>>
>> On Tue, Nov 20, 2012 at 9:44 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
>>
>> > Ok,
>> >
>> > So, I'm still wrapping my mind around this.  I liked being able to use zk
>> > for all clients, since it made it very easy to think about how to update
>> > the kafka cluster.  E.g. how to add new brokers, how to move them all to
>> > new hosts entirely, etc., without having to redeploy all the clients.
>>  The
>> > new brokers will simply advertise their new location via zk, and all
>> > clients will pick it up.
>> >
>> > By requiring use of a configured broker.list for each client, means that
>> > 1000's of deployed apps need to be updated any time the kafka cluster
>> > changes, no?  (Or am I not understanding?).
>> >
>> > You mention that auto-discovery of new brokers will still work, is that
>> > dependent on the existing configured broker.list set still being
>> available
>> > also?
>> >
>> > I can see though how this will greatly reduce the load on zookeeper.
>> >
>> > Jason
>> >
>> >
>> >
>> > On Tue, Nov 20, 2012 at 9:03 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>> >
>> > > Jason,
>> > >
>> > > Auto discovery of new brokers and rolling restart of brokers are still
>> > > supported in 0.8. It's just that most of the ZK related logic is moved
>> to
>> > > the broker.
>> > >
>> > > There are 2 reasons why we want to remove zkclient from the client.
>> > >
>> > > 1. If the client goes to GC, it can cause zk session expiration and
>> cause
>> > > churns in the client and extra load on the zk server.