Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Consumer rebalance per topic


+
Pablo Barrera González 2013-01-07, 17:04
+
Jun Rao 2013-01-07, 17:11
+
Neha Narkhede 2013-01-07, 18:44
+
Pablo Barrera González 2013-01-07, 19:11
+
Pablo Barrera González 2013-01-08, 09:29
Copy link to this message
-
Re: Consumer rebalance per topic
(From http://kafka.apache.org/design.html) one potential benefit of the
existing rebalancing logic is to reduce the number of connections to
brokers per consumer instance. However, if you have a large number of
partitions and few brokers and/or consumer instances then it wouldn't
really help; so I agree it would be good to implement KAFKA-687.
KAFKA-564<https://issues.apache.org/jira/browse/KAFKA-564> may
also be related - i.e., it may be easier to implement along with/after
KAFKA-687,

Joel
On Mon, Jan 7, 2013 at 10:44 AM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> Pablo,
>
> That is a good suggestion. Ideally, the partitions across all topics should
> be distributed evenly across consumer streams instead of a per-topic based
> decision. There is no particular advantage to the current scheme of
> per-topic rebalancing that I can think of. Would you mind filing a JIRA to
> track this improvement ?
>
> Thanks,
> Neha
>
>
> On Mon, Jan 7, 2013 at 9:10 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Pablo,
> >
> > Currently, partition is the smallest unit that we distribute data among
> > consumers (in the same consumer group). So, if the # of consumers is
> larger
> > than the total number of partitions in a Kafka cluster (across all
> > brokers), some consumers will never get any data. Such a decision is done
> > on a per topic basis. If a consumer consumes multiple topics, it would
> make
> > sense to divide partitions across all topics to consumers. We haven't
> done
> > that yet. Part of the reason is that we need to figure out how to balance
> > the data across topics since they can be of different sizes. We can look
> > into that post 0.8.
> >
> > For now, the solution is to increase the number of partitions on the
> > broker.
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Jan 7, 2013 at 9:03 AM, Pablo Barrera González <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hello
> > >
> > > We are starting to use Kafka in production but we found an unexpected
> (at
> > > least for me) behavior with the use of partitions. We have a bunch of
> > > topics with a few partitions each. We try to consume all data from
> > several
> > > consumers (just one consumer group).
> > >
> > > The problem is in the rebalance step. The rebalance splits the
> partitions
> > > per topic between all consumers. So if you have 100 topics but only 2
> > > partitions each and 10 consumers only two consumers will be used. That
> > is,
> > > for each topic all partitions will be listed and shared between the
> > > consumers in the consumer group in order (not randomly).
> > >
> > > This behavior is also described in algorithm 1 of the original kafka
> > paper
> > > [1].
> > >
> > > I don't understand this decision. Why is split by topic? Does it make
> > sense
> > > to divide all partitions from all topics between all the consumers in
> the
> > > consumer group? I don't see the reason of this so I would like to hear
> > your
> > > opinion before changing the code.
> > >
> > > We are using kafka 0.7.1.
> > >
> > > Thank you in advance
> > >
> > > Pablo
> > >
> > > [1] "Kafka: a Distributed Messaging System for Log Processing", Jay
> > Kreps,
> > > Neha Narkhede and Jun Rao.
> > >
> > >
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> > >
> >
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB