we currently face a "problem" on our consumer cluster, which may have a simple solution. Never the less I could not find this solution yet.
Description of problem: 1 kafka topic with 24 partitions (kafka version 0.8 Beta1 2 or more consumers in same consumer group. Each consumer processes its partitions by aggregating topic data into a relational database. Each consumer hashes the aggregation data locally for commiting data into the relational database. After commit to database the consumerConnector commits the offsets to kafka.
Problem is: If I connect a new consumer, the consumerconnector recalculates the partitions to read from on each consumer instance. That causes our system currently to process topic-data twice because of the local aggregation within the consumer.
Is there any possibility to catch the event of new partition selection in conumserConnector to commit the offsets and data before reconnecting to new partitions?
For now your best bet is to use the SimpleConsumer and implement your own rebalancing strategy. Another hacky approach is to register zookeeper watches on the /consumers/<group>/owners path that indicates the partition ownership change.
Thanks, Neha On Oct 8, 2013 2:12 AM, "Markus Roder" <[EMAIL PROTECTED]> wrote:
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext