Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Consumer rebalance per topic


Copy link to this message
-
Re: Consumer rebalance per topic
(From http://kafka.apache.org/design.html) one potential benefit of the
existing rebalancing logic is to reduce the number of connections to
brokers per consumer instance. However, if you have a large number of
partitions and few brokers and/or consumer instances then it wouldn't
really help; so I agree it would be good to implement KAFKA-687.
KAFKA-564<https://issues.apache.org/jira/browse/KAFKA-564> may
also be related - i.e., it may be easier to implement along with/after
KAFKA-687,

Joel
On Mon, Jan 7, 2013 at 10:44 AM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> Pablo,
>
> That is a good suggestion. Ideally, the partitions across all topics should
> be distributed evenly across consumer streams instead of a per-topic based
> decision. There is no particular advantage to the current scheme of
> per-topic rebalancing that I can think of. Would you mind filing a JIRA to
> track this improvement ?
>
> Thanks,
> Neha
>
>
> On Mon, Jan 7, 2013 at 9:10 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Pablo,
> >
> > Currently, partition is the smallest unit that we distribute data among
> > consumers (in the same consumer group). So, if the # of consumers is
> larger
> > than the total number of partitions in a Kafka cluster (across all
> > brokers), some consumers will never get any data. Such a decision is done
> > on a per topic basis. If a consumer consumes multiple topics, it would
> make
> > sense to divide partitions across all topics to consumers. We haven't
> done
> > that yet. Part of the reason is that we need to figure out how to balance
> > the data across topics since they can be of different sizes. We can look
> > into that post 0.8.
> >
> > For now, the solution is to increase the number of partitions on the
> > broker.
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Jan 7, 2013 at 9:03 AM, Pablo Barrera González <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hello
> > >
> > > We are starting to use Kafka in production but we found an unexpected
> (at
> > > least for me) behavior with the use of partitions. We have a bunch of
> > > topics with a few partitions each. We try to consume all data from
> > several
> > > consumers (just one consumer group).
> > >
> > > The problem is in the rebalance step. The rebalance splits the
> partitions
> > > per topic between all consumers. So if you have 100 topics but only 2
> > > partitions each and 10 consumers only two consumers will be used. That
> > is,
> > > for each topic all partitions will be listed and shared between the
> > > consumers in the consumer group in order (not randomly).
> > >
> > > This behavior is also described in algorithm 1 of the original kafka
> > paper
> > > [1].
> > >
> > > I don't understand this decision. Why is split by topic? Does it make
> > sense
> > > to divide all partitions from all topics between all the consumers in
> the
> > > consumer group? I don't see the reason of this so I would like to hear
> > your
> > > opinion before changing the code.
> > >
> > > We are using kafka 0.7.1.
> > >
> > > Thank you in advance
> > >
> > > Pablo
> > >
> > > [1] "Kafka: a Distributed Messaging System for Log Processing", Jay
> > Kreps,
> > > Neha Narkhede and Jun Rao.
> > >
> > >
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> > >
> >
>