Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?


Copy link to this message
-
Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?
Hello -- is it possible for our code to stall a ConsumerConnector from
doing any consuming for, say, 30 seconds, until we can be sure that
all other ConsumeConnectors are rebalanced?

It seems that the first ConsumerConnector to come up is prefetching
some data, and we end up with duplicate messages. We looked at the
code for the high-level consumer
(core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala)
and it looks like it empties some queues after a rebalance, but we
still see duplicate messages.

I'm sure this question has been asked before :-) but this is our first
time really working with the high-level consumer, and this caught us
by surprise. When there is *no* data in Kafka, wait until everything
balances and then send data in everything works fine, but if there is
some data sitting in the brokers, we seems to get dupes, even when
each thread sleeps for many seconds after creating the
ConsumerConnector.

Are we missing something?

Thanks,

Philip

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB