Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Kafka wiki Documentation conventions - looking for feedback


Copy link to this message
-
Re: Kafka wiki Documentation conventions - looking for feedback
Jun Rao 2013-04-29, 15:58
Basically, every time a consumer joins a group, every consumer in the
groups gets a ZK notification and each of them tries to own a subset of the
total number of partitions. A given partition is only assigned to one of
the consumers in the same group. Once the ownership is determined, each
consumer consumes messages coming from its partitions and manages the
offset of those partitions. Since at any given point of time, a partition
is only owned by one consumer, there won't be conflicts on updating the
offsets. More details are described in the "consumer rebalancing algorithm"
section of http://kafka.apache.org/07/design.html

Thanks,

Jun
On Mon, Apr 29, 2013 at 8:16 AM, Chris Curtin <[EMAIL PROTECTED]>wrote:

> Jun, can you explain this a little better? I thought when using Consumer
> Groups that on startup Kafka connects to ZooKeeper and finds the last read
> offset for every partition in the topic being requested for the group. That
> is then the starting point for the consumer threads.
>
> If a second process starts while the first one is running with the same
> Consumer Group, won't the second one read the last offsets consumed by the
> already running process and start processing from there? Then as the first
> process syncs consumed offsets, won't the 2nd process's next update
> overwrite them?
>
> Thanks,
>
> Chris
>
>
>
>
> On Mon, Apr 29, 2013 at 11:03 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Chris,
> >
> > Thanks for the writeup. Looks great overall. A couple of comments.
> >
> > 1. At the beginning, it sounds like that one can't run multiple processes
> > of consumers in the same group. This is actually not true. We can create
> > multiple instances of consumers for the same group in the same JVM or
> > different JVMs. The consumers will auto-balance among themselves.
> >
> > 2. We have changed the name of some config properties.
> > auto.commit.interval.ms is correct. However, zk.connect,
> > zk.session.timeout.ms and zk.sync.time.ms are changed to
> > zookeeper.connect,
> > zookeeper.session.timeout.ms, and zookeeper.sync.time.ms, respectively.
> >
> > I will add a link to your wiki in our website.
> >
> > Thanks again.
> >
> > Jun
> >
> >
> > On Mon, Apr 29, 2013 at 5:54 AM, Chris Curtin <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi Jun,
> > >
> > > I finished and published it this morning:
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > >
> > > One question: when documenting the ConsumerConfig parameters I couldn't
> > > find a description for the 'auto.commit.interval.ms' setting. I found
> > one
> > > for 'autocommit.interval.ms' (no '.' between auto and commit) in the
> > > Google
> > > Cache only. Which spelling is it? Also is my description of it correct?
> > >
> > > I'll take a look at custom encoders later this week. Today and Tuesday
> > are
> > > going to be pretty busy.
> > >
> > > Please let me know if there are changes needed to the High Level
> Consumer
> > > page.
> > >
> > > Thanks,
> > >
> > > Chris
> > >
> > >
> > > On Mon, Apr 29, 2013 at 12:50 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > Chris,
> > > >
> > > > Any update of the high level consumer example?
> > > >
> > > > Also, in the Producer example, it would be useful to describe how to
> > > write
> > > > a customized encoder. One subtle thing is that the encoder needs a
> > > > constructor that takes a a single VerifiableProperties argument (
> > > > https://issues.apache.org/jira/browse/KAFKA-869).
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > >
> > > >
> > >
> >
>