Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Suggestion on ZkClient usage in Kafka


+
Bae, Jae Hyeon 2012-11-19, 06:35
+
Sybrandy, Casey 2012-11-19, 13:31
+
Jun Rao 2012-11-19, 16:07
+
Jason Rosenberg 2012-11-19, 20:10
+
Neha Narkhede 2012-11-19, 20:20
+
Jason Rosenberg 2012-11-19, 21:29
+
Jun Rao 2012-11-20, 05:23
+
David Arthur 2012-11-20, 15:54
+
Jason Rosenberg 2012-11-20, 16:04
+
Jun Rao 2012-11-20, 16:15
Copy link to this message
-
Re: Suggestion on ZkClient usage in Kafka
David,

The change in 0.8 is that instead of requiring zk.connect, we require
broker.list. In both cases, you typically provide a list of hosts and
ports. Functionality wise, they achieve the same thing, ie, the producer is
able to send the data to the right broker. Are you saying that zk.connect
is more convenient? One benefit of using broker.list is that one can
provide a vip as the only host. This makes it easy to add/remove brokers
since no producer side config needs to be changed. Changing hosts in
zk.connect, on the other hand, requires config changes in the client.
Another reason for removing zkclient in the producer is that if the client
GCs, it can cause churns in the producer and extra load on the zk server.
Since our producer can be embedded in any client, it's hard for us to
control the GC rate. So, removing zkclient in the producer releases the
potential pressure from client GC.

We still rely on ZK for failure detection and leader election on the broker
and the consumer though.

Thanks,

Jun

On Tue, Nov 20, 2012 at 7:54 AM, David Arthur <[EMAIL PROTECTED]> wrote:

>
> On Nov 20, 2012, at 12:23 AM, Jun Rao wrote:
>
> > Jason,
> >
> > In 0.8, producer doesn't use zkclient at all. You just need to set
> > broker.list.
>
> This seems like a regression in functionality. For me, one of the benefits
> of Kafka is only needing to know about ZooKeeper
>
> > A number of things have changed In 0.8. First, number of
> > partitions of a topic is global in a cluster and they don't really change
> > as new brokers are added. Second, a partition is assigned to multiple
> > brokers for replication and one of the replicas is the leader which
> serves
> > writes. When a producer starts up, it first uses the getMetadata api to
> > figure out the replica assignment for the relevant topic/partition. It
> then
> > issues producer request directly the broker where the leader resides. If
> > the leader broker goes down, the producer gets an exception and it will
> > re-issue the getMetadata api to obtain the information about the new
> leader.
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Nov 19, 2012 at 1:29 PM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
> >
> >> Well, they do use zk though, to get the initial list of kafka nodes, and
> >> while zk is available, presumably they do use it to keep up with the
> >> dynamically changing set of kafka brokers, no?  You are just saying
> that if
> >> zk goes away, 0.8 producers can keep on producing, as long as the kafka
> >> cluster remains stable?
> >>
> >> Jason
> >>
> >> On Mon, Nov 19, 2012 at 12:20 PM, Neha Narkhede <
> [EMAIL PROTECTED]
> >>> wrote:
> >>
> >>> In 0.8, producers don't use zk. When producers encounter an error
> >>> while sending data, they use a special getMetadata request to refresh
> >>> the kafka cluster info from a randomly selected Kafka broker, and
> >>> retry sending the data.
> >>>
> >>> Thanks,
> >>> Neha
> >>>
> >>> On Mon, Nov 19, 2012 at 12:10 PM, Jason Rosenberg <[EMAIL PROTECTED]>
> >>> wrote:
> >>>> Are you saying that in 0.8, producers don't use zkclient?  Or don't
> >> need
> >>>> it?  How can a producer dynamically respond to a change in the kafka
> >>>> cluster without zk?
> >>>>
> >>>> On Mon, Nov 19, 2012 at 8:07 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>> Jae,
> >>>>>
> >>>>> In 0.8, producers don't need ZK client anymore. Instead, it uses a
> new
> >>>>> getMetadata api to get topic/partition/leader information from the
> >>> broker.
> >>>>> Consumers still need ZK client. We plan to redesign the consumer post
> >>> 0.8
> >>>>> and can keep this in mind.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Jun
> >>>>>
> >>>>> On Sun, Nov 18, 2012 at 10:35 PM, Bae, Jae Hyeon <[EMAIL PROTECTED]
> >
> >>>>> wrote:
> >>>>>
> >>>>>> I want to suggest kafka should create only one instance of ZkClient
> >>>>>> globally because ZkClient is thread safe and it will make many users
> >>>>>> easily customize kafka source code for Zookeeper.
> >>>>>>
> >>>>>> In our company's cloud environment, it is not recommended to create
+
Bae, Jae Hyeon 2012-11-20, 17:58
+
Neha Narkhede 2012-11-20, 18:02
+
David Arthur 2012-11-20, 18:20
+
Jun Rao 2012-11-20, 18:42
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB