Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Consumer rebalance per topic


Copy link to this message
-
Re: Consumer rebalance per topic
Jira ticket https://issues.apache.org/jira/browse/KAFKA-687

2013/1/7 Pablo Barrera González <[EMAIL PROTECTED]>

> Thank you Jun and Neha
>
> I was trying to avoid adding more partitions. I have enough partitions if
> you count all partitions in all topics. I understand the problem with
> different data load per topic but the current schema does not solve this
> problem either so we shouldn't be worse is we consider all partitions from
> all topics at the same time.
>
> I will open the JIRA ticket to track this.
>
> Thanks again for the clarification.
>
> Cheers
>
> Pablo
>
>
>
> 2013/1/7 Neha Narkhede <[EMAIL PROTECTED]>
>
>> Pablo,
>>
>> That is a good suggestion. Ideally, the partitions across all topics
>> should
>> be distributed evenly across consumer streams instead of a per-topic based
>> decision. There is no particular advantage to the current scheme of
>> per-topic rebalancing that I can think of. Would you mind filing a JIRA to
>> track this improvement ?
>>
>> Thanks,
>> Neha
>>
>>
>> On Mon, Jan 7, 2013 at 9:10 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>>
>> > Pablo,
>> >
>> > Currently, partition is the smallest unit that we distribute data among
>> > consumers (in the same consumer group). So, if the # of consumers is
>> larger
>> > than the total number of partitions in a Kafka cluster (across all
>> > brokers), some consumers will never get any data. Such a decision is
>> done
>> > on a per topic basis. If a consumer consumes multiple topics, it would
>> make
>> > sense to divide partitions across all topics to consumers. We haven't
>> done
>> > that yet. Part of the reason is that we need to figure out how to
>> balance
>> > the data across topics since they can be of different sizes. We can look
>> > into that post 0.8.
>> >
>> > For now, the solution is to increase the number of partitions on the
>> > broker.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Mon, Jan 7, 2013 at 9:03 AM, Pablo Barrera González <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> > > Hello
>> > >
>> > > We are starting to use Kafka in production but we found an unexpected
>> (at
>> > > least for me) behavior with the use of partitions. We have a bunch of
>> > > topics with a few partitions each. We try to consume all data from
>> > several
>> > > consumers (just one consumer group).
>> > >
>> > > The problem is in the rebalance step. The rebalance splits the
>> partitions
>> > > per topic between all consumers. So if you have 100 topics but only 2
>> > > partitions each and 10 consumers only two consumers will be used. That
>> > is,
>> > > for each topic all partitions will be listed and shared between the
>> > > consumers in the consumer group in order (not randomly).
>> > >
>> > > This behavior is also described in algorithm 1 of the original kafka
>> > paper
>> > > [1].
>> > >
>> > > I don't understand this decision. Why is split by topic? Does it make
>> > sense
>> > > to divide all partitions from all topics between all the consumers in
>> the
>> > > consumer group? I don't see the reason of this so I would like to hear
>> > your
>> > > opinion before changing the code.
>> > >
>> > > We are using kafka 0.7.1.
>> > >
>> > > Thank you in advance
>> > >
>> > > Pablo
>> > >
>> > > [1] "Kafka: a Distributed Messaging System for Log Processing", Jay
>> > Kreps,
>> > > Neha Narkhede and Jun Rao.
>> > >
>> > >
>> >
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>> > >
>> >
>>
>
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB