Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Consumer group concept


Copy link to this message
-
Re: Consumer group concept
Hi Jeff,

Load balancing is done by range partitioning the available partitions for
the topic across the consumer processes (streams). The algorithm is given
at the very end of the design document:
http://incubator.apache.org/kafka/design.html - but here's a quick example.
If you have four nodes, and two message streams per node (i.e., each node's
consumer config is "foo":2) this means there are eight consumer streams in
total. The available partitions for "foo" are allocated to these eight
streams using the rebalancing algorithm. For e.g,. if there are eight
available partitions on the brokers then each consumer stream with get one
partition. If there are fewer than eight, some of the consumer streams will
not get any data. If there are more than eight, then some streams will get
more than one partition (if # partitions % # streams == 0 then it will be
evenly spread, and skewed otherwise).

Thanks,

Joel

On Tue, Jun 12, 2012 at 1:55 PM, Rodenburg, Jeff <[EMAIL PROTECTED]
> wrote:

> Great, I'm running the quick start and can see that in operation.
>
> Ok, last question on this thread:
>
> > So if you have two consumer groups consuming a topic, and each consumer
> group has 4 machines in it, then a message published to this topic would be
> delivered to one machine in each of the two groups.
>
> How is topic load-balancing for consumers handled?  For example, if a
> consumer group has 4 machines in it (consumer per machine), in reality only
> one machine in the group is actually working.  If I want multiple machines
> handling items in a topic, how is that approach handled? I could see
> producers generating more topics, and consumers subscribing to those
> (making a high-volume topic more granular).  What's best practice when
> consumer tasks on topic messages need to be handled by multiple consumers?
>
> -Jeff
>
>
>
>
>
> On Jun 12, 2012, at 11:46 AM, Jay Kreps wrote:
>
> > Basically the rule is this "every message sent to the topic is delivered
> to
> > one machine/process in each consumer group". So if you have two consumer
> > groups consuming a topic, and each consumer group has 4 machines in it,
> > then a message published to this topic would be delivered to one machine
> in
> > each of the two groups.
> >
> > -Jay
> >
> > On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Thanks for the info, Jun.
> >>
> >>> if you just want each message to be consumed by a consumer, not a
> >> particular one
> >>
> >> What is intended to be a particular consumer? Something on the order of
> >> Consumer #3 within a group needs message #123?
> >>
> >> Ok, next question:
> >>
> >> What is the relationship between topics and consumer groups? More to the
> >> point, can I have multiple consumer groups that all consume the same
> topic?
> >> For example, assume a set of producers are publishing to the topic
> "ABC".
> >> Suppose I have multiple processes that take action on a given ABC
> message
> >> -- process 1 handles billing, process 2 handles file management,
> process 3
> >> handles history/archiving, etc.  Can I structure multiple groups that
> >> consume the same topic? How does partitioning work at that point?
> >>
> >>
> >>
> >>
> >> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:
> >>
> >>> Jeff,
> >>>
> >>> Your understanding is correct. Operational wise, we have some jmx that
> >>> gives consumer stats per topic. There is also a tool CheckOffsetLag
> that
> >>> tells you how far behind a consumer is. For coordination btw producers
> >> and
> >>> consumers, if you just want each message to be consumed by a consumer,
> >> not
> >>> a particular one, there is no coordination needed.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <
> >> [EMAIL PROTECTED]
> >>>> wrote:
> >>>
> >>>> Hi all -
> >>>>
> >>>> Just getting familiar with Kafka, and learning about consumer groups.
> >>>> Hoping someone can provide some context here.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB