Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Consumer rebalance per topic


+
Pablo Barrera González 2013-01-07, 17:04
+
Jun Rao 2013-01-07, 17:11
+
Neha Narkhede 2013-01-07, 18:44
Copy link to this message
-
Re: Consumer rebalance per topic
Thank you Jun and Neha

I was trying to avoid adding more partitions. I have enough partitions if
you count all partitions in all topics. I understand the problem with
different data load per topic but the current schema does not solve this
problem either so we shouldn't be worse is we consider all partitions from
all topics at the same time.

I will open the JIRA ticket to track this.

Thanks again for the clarification.

Cheers

Pablo

2013/1/7 Neha Narkhede <[EMAIL PROTECTED]>

> Pablo,
>
> That is a good suggestion. Ideally, the partitions across all topics should
> be distributed evenly across consumer streams instead of a per-topic based
> decision. There is no particular advantage to the current scheme of
> per-topic rebalancing that I can think of. Would you mind filing a JIRA to
> track this improvement ?
>
> Thanks,
> Neha
>
>
> On Mon, Jan 7, 2013 at 9:10 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Pablo,
> >
> > Currently, partition is the smallest unit that we distribute data among
> > consumers (in the same consumer group). So, if the # of consumers is
> larger
> > than the total number of partitions in a Kafka cluster (across all
> > brokers), some consumers will never get any data. Such a decision is done
> > on a per topic basis. If a consumer consumes multiple topics, it would
> make
> > sense to divide partitions across all topics to consumers. We haven't
> done
> > that yet. Part of the reason is that we need to figure out how to balance
> > the data across topics since they can be of different sizes. We can look
> > into that post 0.8.
> >
> > For now, the solution is to increase the number of partitions on the
> > broker.
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Jan 7, 2013 at 9:03 AM, Pablo Barrera González <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hello
> > >
> > > We are starting to use Kafka in production but we found an unexpected
> (at
> > > least for me) behavior with the use of partitions. We have a bunch of
> > > topics with a few partitions each. We try to consume all data from
> > several
> > > consumers (just one consumer group).
> > >
> > > The problem is in the rebalance step. The rebalance splits the
> partitions
> > > per topic between all consumers. So if you have 100 topics but only 2
> > > partitions each and 10 consumers only two consumers will be used. That
> > is,
> > > for each topic all partitions will be listed and shared between the
> > > consumers in the consumer group in order (not randomly).
> > >
> > > This behavior is also described in algorithm 1 of the original kafka
> > paper
> > > [1].
> > >
> > > I don't understand this decision. Why is split by topic? Does it make
> > sense
> > > to divide all partitions from all topics between all the consumers in
> the
> > > consumer group? I don't see the reason of this so I would like to hear
> > your
> > > opinion before changing the code.
> > >
> > > We are using kafka 0.7.1.
> > >
> > > Thank you in advance
> > >
> > > Pablo
> > >
> > > [1] "Kafka: a Distributed Messaging System for Log Processing", Jay
> > Kreps,
> > > Neha Narkhede and Jun Rao.
> > >
> > >
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> > >
> >
>

 
+
Pablo Barrera González 2013-01-08, 09:29
+
Joel Koshy 2013-01-08, 20:08