Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Suggestion on ZkClient usage in Kafka


Copy link to this message
-
Re: Suggestion on ZkClient usage in Kafka
You can try to put all brokers in a vip and expose the vip to the producer.
If there is no vip, it takes the same amount effort as moving a zk cluster
to a new set of hosts.

Thanks,

Jun

On Tue, Nov 20, 2012 at 10:20 AM, David Arthur <[EMAIL PROTECTED]> wrote:

> If I understand correctly, the brokers stay informed about one another
> through ZooKeeper and therefor any broker can give info about any other
> broker?
>
> This is an interesting approach. What would happen if your broker list
> changed dramatically over time?
>
> On Nov 20, 2012, at 1:02 PM, Neha Narkhede wrote:
>
> > This is being discussed in another thread -
> > http://markmail.org/message/mypnt7sgkqt55jb2?q=Jason+async+producer
> >
> > Basically, you want zookeeper on the producer to do just one thing -
> > notify the change in the liveness of brokers in Kafka
> > cluster. In 0.8, brokers are not the entity to worry about, what we
> > care about are replicas for the partitions that the producer
> > is sending data to, in particular just the leader replica (since only
> > the leader can accept writes for a partition)
> >
> > The producer keeps a cache of (topic, partition) -> leader-replica.
> > Now, if that cache is either empty or stale (due to changes
> > on the Kafka cluster), the next produce request will get an ACK with
> > an error code NotLeaderForPartition. That's when it
> > fires the getMetadata request that refreshes its cache. Assuming
> > you've configured your producer to rety (producer.num.retries)
> > more than once, it will succeed sending data the next time.
> >
> > In other words, instead of zookeeper 'notifying' us of the changes on
> > the Kafka cluster, we let the producer lazily update its
> > cache by invoking a special API on any of the Kafka brokers. That way,
> > we have much fewer connections to zk, zk upgrades
> > are easier, so are upgrades to the producer and we also achieve the
> > goal of replica discovery.
> >
> > Thanks,
> > Neha
> >
> > On Tue, Nov 20, 2012 at 9:58 AM, Bae, Jae Hyeon <[EMAIL PROTECTED]>
> wrote:
> >> In the case that producer does not require zk.connect, how can the
> >> producer recognize the new brokers or brokers which went down?
> >>
> >> On Tue, Nov 20, 2012 at 8:31 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >>> David,
> >>>
> >>> The change in 0.8 is that instead of requiring zk.connect, we require
> >>> broker.list. In both cases, you typically provide a list of hosts and
> >>> ports. Functionality wise, they achieve the same thing, ie, the
> producer is
> >>> able to send the data to the right broker. Are you saying that
> zk.connect
> >>> is more convenient? One benefit of using broker.list is that one can
> >>> provide a vip as the only host. This makes it easy to add/remove
> brokers
> >>> since no producer side config needs to be changed. Changing hosts in
> >>> zk.connect, on the other hand, requires config changes in the client.
> >>> Another reason for removing zkclient in the producer is that if the
> client
> >>> GCs, it can cause churns in the producer and extra load on the zk
> server.
> >>> Since our producer can be embedded in any client, it's hard for us to
> >>> control the GC rate. So, removing zkclient in the producer releases the
> >>> potential pressure from client GC.
> >>>
> >>> We still rely on ZK for failure detection and leader election on the
> broker
> >>> and the consumer though.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Tue, Nov 20, 2012 at 7:54 AM, David Arthur <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>>>
> >>>> On Nov 20, 2012, at 12:23 AM, Jun Rao wrote:
> >>>>
> >>>>> Jason,
> >>>>>
> >>>>> In 0.8, producer doesn't use zkclient at all. You just need to set
> >>>>> broker.list.
> >>>>
> >>>> This seems like a regression in functionality. For me, one of the
> benefits
> >>>> of Kafka is only needing to know about ZooKeeper
> >>>>
> >>>>> A number of things have changed In 0.8. First, number of
> >>>>> partitions of a topic is global in a cluster and they don't really
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB