Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Consumer rebalance per topic


Copy link to this message
-
Re: Consumer rebalance per topic
Neha Narkhede 2013-01-07, 18:44
Pablo,

That is a good suggestion. Ideally, the partitions across all topics should
be distributed evenly across consumer streams instead of a per-topic based
decision. There is no particular advantage to the current scheme of
per-topic rebalancing that I can think of. Would you mind filing a JIRA to
track this improvement ?

Thanks,
Neha
On Mon, Jan 7, 2013 at 9:10 AM, Jun Rao <[EMAIL PROTECTED]> wrote:

> Pablo,
>
> Currently, partition is the smallest unit that we distribute data among
> consumers (in the same consumer group). So, if the # of consumers is larger
> than the total number of partitions in a Kafka cluster (across all
> brokers), some consumers will never get any data. Such a decision is done
> on a per topic basis. If a consumer consumes multiple topics, it would make
> sense to divide partitions across all topics to consumers. We haven't done
> that yet. Part of the reason is that we need to figure out how to balance
> the data across topics since they can be of different sizes. We can look
> into that post 0.8.
>
> For now, the solution is to increase the number of partitions on the
> broker.
>
> Thanks,
>
> Jun
>
> On Mon, Jan 7, 2013 at 9:03 AM, Pablo Barrera González <
> [EMAIL PROTECTED]> wrote:
>
> > Hello
> >
> > We are starting to use Kafka in production but we found an unexpected (at
> > least for me) behavior with the use of partitions. We have a bunch of
> > topics with a few partitions each. We try to consume all data from
> several
> > consumers (just one consumer group).
> >
> > The problem is in the rebalance step. The rebalance splits the partitions
> > per topic between all consumers. So if you have 100 topics but only 2
> > partitions each and 10 consumers only two consumers will be used. That
> is,
> > for each topic all partitions will be listed and shared between the
> > consumers in the consumer group in order (not randomly).
> >
> > This behavior is also described in algorithm 1 of the original kafka
> paper
> > [1].
> >
> > I don't understand this decision. Why is split by topic? Does it make
> sense
> > to divide all partitions from all topics between all the consumers in the
> > consumer group? I don't see the reason of this so I would like to hear
> your
> > opinion before changing the code.
> >
> > We are using kafka 0.7.1.
> >
> > Thank you in advance
> >
> > Pablo
> >
> > [1] "Kafka: a Distributed Messaging System for Log Processing", Jay
> Kreps,
> > Neha Narkhede and Jun Rao.
> >
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> >
>