That is a good suggestion. Ideally, the partitions across all topics should
be distributed evenly across consumer streams instead of a per-topic based
decision. There is no particular advantage to the current scheme of
per-topic rebalancing that I can think of. Would you mind filing a JIRA to
track this improvement ?
On Mon, Jan 7, 2013 at 9:10 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> Currently, partition is the smallest unit that we distribute data among
> consumers (in the same consumer group). So, if the # of consumers is larger
> than the total number of partitions in a Kafka cluster (across all
> brokers), some consumers will never get any data. Such a decision is done
> on a per topic basis. If a consumer consumes multiple topics, it would make
> sense to divide partitions across all topics to consumers. We haven't done
> that yet. Part of the reason is that we need to figure out how to balance
> the data across topics since they can be of different sizes. We can look
> into that post 0.8.
> For now, the solution is to increase the number of partitions on the
> On Mon, Jan 7, 2013 at 9:03 AM, Pablo Barrera González <
> [EMAIL PROTECTED]> wrote:
> > Hello
> > We are starting to use Kafka in production but we found an unexpected (at
> > least for me) behavior with the use of partitions. We have a bunch of
> > topics with a few partitions each. We try to consume all data from
> > consumers (just one consumer group).
> > The problem is in the rebalance step. The rebalance splits the partitions
> > per topic between all consumers. So if you have 100 topics but only 2
> > partitions each and 10 consumers only two consumers will be used. That
> > for each topic all partitions will be listed and shared between the
> > consumers in the consumer group in order (not randomly).
> > This behavior is also described in algorithm 1 of the original kafka
> > .
> > I don't understand this decision. Why is split by topic? Does it make
> > to divide all partitions from all topics between all the consumers in the
> > consumer group? I don't see the reason of this so I would like to hear
> > opinion before changing the code.
> > We are using kafka 0.7.1.
> > Thank you in advance
> > Pablo
> >  "Kafka: a Distributed Messaging System for Log Processing", Jay
> > Neha Narkhede and Jun Rao.