Hello -- is it possible for our code to stall a ConsumerConnector from doing any consuming for, say, 30 seconds, until we can be sure that all other ConsumeConnectors are rebalanced?
It seems that the first ConsumerConnector to come up is prefetching some data, and we end up with duplicate messages. We looked at the code for the high-level consumer (core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala) and it looks like it empties some queues after a rebalance, but we still see duplicate messages.
I'm sure this question has been asked before :-) but this is our first time really working with the high-level consumer, and this caught us by surprise. When there is *no* data in Kafka, wait until everything balances and then send data in everything works fine, but if there is some data sitting in the brokers, we seems to get dupes, even when each thread sleeps for many seconds after creating the ConsumerConnector.
Just to be clear, I'm not asking that we solve "duplicate messages on crash before commit to Zookeeper", just an apparent problem where if Kafka has some data, and we start on ConsumerConnectors, we get dupe data since some Consumers come up before others.
On Thu, Jun 13, 2013 at 7:34 PM, Philip O'Toole <[EMAIL PROTECTED]> wrote:
Are you messages compressed in batches? If so, some dups are expected during rebalance. In 0.8, such dups are eliminated. Other than that, rebalance shouldn't cause dups since we commit consumed offsets to ZK before doing a rebalance.
Jun On Thu, Jun 13, 2013 at 7:34 PM, Philip O'Toole <[EMAIL PROTECTED]> wrote:
If there are no offsets stored in ZK, I think it's possible to get some dups during startup. Once the offsets are in ZK, there shouldn't be dups during subsequent rebalances.
Jun On Fri, Jun 14, 2013 at 2:04 PM, Philip O'Toole <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext