Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - Suggestion on ZkClient usage in Kafka


+
Bae, Jae Hyeon 2012-11-19, 06:35
+
Sybrandy, Casey 2012-11-19, 13:31
+
Jun Rao 2012-11-19, 16:07
+
Jason Rosenberg 2012-11-19, 20:10
+
Neha Narkhede 2012-11-19, 20:20
+
Jason Rosenberg 2012-11-19, 21:29
+
Jun Rao 2012-11-20, 05:23
+
David Arthur 2012-11-20, 15:54
+
Jason Rosenberg 2012-11-20, 16:04
+
Jun Rao 2012-11-20, 16:15
+
Jun Rao 2012-11-20, 16:31
+
Bae, Jae Hyeon 2012-11-20, 17:58
+
Neha Narkhede 2012-11-20, 18:02
+
David Arthur 2012-11-20, 18:20
Copy link to this message
-
Re: Suggestion on ZkClient usage in Kafka
Jun Rao 2012-11-20, 18:42
You can try to put all brokers in a vip and expose the vip to the producer.
If there is no vip, it takes the same amount effort as moving a zk cluster
to a new set of hosts.

Thanks,

Jun

On Tue, Nov 20, 2012 at 10:20 AM, David Arthur <[EMAIL PROTECTED]> wrote:

> If I understand correctly, the brokers stay informed about one another
> through ZooKeeper and therefor any broker can give info about any other
> broker?
>
> This is an interesting approach. What would happen if your broker list
> changed dramatically over time?
>
> On Nov 20, 2012, at 1:02 PM, Neha Narkhede wrote:
>
> > This is being discussed in another thread -
> > http://markmail.org/message/mypnt7sgkqt55jb2?q=Jason+async+producer
> >
> > Basically, you want zookeeper on the producer to do just one thing -
> > notify the change in the liveness of brokers in Kafka
> > cluster. In 0.8, brokers are not the entity to worry about, what we
> > care about are replicas for the partitions that the producer
> > is sending data to, in particular just the leader replica (since only
> > the leader can accept writes for a partition)
> >
> > The producer keeps a cache of (topic, partition) -> leader-replica.
> > Now, if that cache is either empty or stale (due to changes
> > on the Kafka cluster), the next produce request will get an ACK with
> > an error code NotLeaderForPartition. That's when it
> > fires the getMetadata request that refreshes its cache. Assuming
> > you've configured your producer to rety (producer.num.retries)
> > more than once, it will succeed sending data the next time.
> >
> > In other words, instead of zookeeper 'notifying' us of the changes on
> > the Kafka cluster, we let the producer lazily update its
> > cache by invoking a special API on any of the Kafka brokers. That way,
> > we have much fewer connections to zk, zk upgrades
> > are easier, so are upgrades to the producer and we also achieve the
> > goal of replica discovery.
> >
> > Thanks,
> > Neha
> >
> > On Tue, Nov 20, 2012 at 9:58 AM, Bae, Jae Hyeon <[EMAIL PROTECTED]>
> wrote:
> >> In the case that producer does not require zk.connect, how can the
> >> producer recognize the new brokers or brokers which went down?
> >>
> >> On Tue, Nov 20, 2012 at 8:31 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >>> David,
> >>>
> >>> The change in 0.8 is that instead of requiring zk.connect, we require
> >>> broker.list. In both cases, you typically provide a list of hosts and
> >>> ports. Functionality wise, they achieve the same thing, ie, the
> producer is
> >>> able to send the data to the right broker. Are you saying that
> zk.connect
> >>> is more convenient? One benefit of using broker.list is that one can
> >>> provide a vip as the only host. This makes it easy to add/remove
> brokers
> >>> since no producer side config needs to be changed. Changing hosts in
> >>> zk.connect, on the other hand, requires config changes in the client.
> >>> Another reason for removing zkclient in the producer is that if the
> client
> >>> GCs, it can cause churns in the producer and extra load on the zk
> server.
> >>> Since our producer can be embedded in any client, it's hard for us to
> >>> control the GC rate. So, removing zkclient in the producer releases the
> >>> potential pressure from client GC.
> >>>
> >>> We still rely on ZK for failure detection and leader election on the
> broker
> >>> and the consumer though.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Tue, Nov 20, 2012 at 7:54 AM, David Arthur <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>>>
> >>>> On Nov 20, 2012, at 12:23 AM, Jun Rao wrote:
> >>>>
> >>>>> Jason,
> >>>>>
> >>>>> In 0.8, producer doesn't use zkclient at all. You just need to set
> >>>>> broker.list.
> >>>>
> >>>> This seems like a regression in functionality. For me, one of the
> benefits
> >>>> of Kafka is only needing to know about ZooKeeper
> >>>>
> >>>>> A number of things have changed In 0.8. First, number of
> >>>>> partitions of a topic is global in a cluster and they don't really