Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?


+
Philip OToole 2013-06-14, 02:35
+
Philip OToole 2013-06-14, 02:57
+
Jun Rao 2013-06-14, 04:16
+
Philip OToole 2013-06-14, 04:27
Copy link to this message
-
Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?
On Thu, Jun 13, 2013 at 9:15 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> Are you messages compressed in batches? If so, some dups are expected
> during rebalance. In 0.8, such dups are eliminated. Other than that,
> rebalance shouldn't cause dups since we commit consumed offsets to ZK
> before doing a rebalance.

Jun -- quick clarification. Is this guarantee valid even if there is
no state in Zookeeper? If the consumers that will rebalance are coming
up for the *very first time*? I.e.:

[zk: localhost:2181(CONNECTED) 1] ls /consumers
Node does not exist: /consumers
[zk: localhost:2181(CONNECTED) 2]

Philip

>
> Thanks,
>
> Jun
>
>
> On Thu, Jun 13, 2013 at 7:34 PM, Philip O'Toole <[EMAIL PROTECTED]> wrote:
>
>> Hello -- is it possible for our code to stall a ConsumerConnector from
>> doing any consuming for, say, 30 seconds, until we can be sure that
>> all other ConsumeConnectors are rebalanced?
>>
>> It seems that the first ConsumerConnector to come up is prefetching
>> some data, and we end up with duplicate messages. We looked at the
>> code for the high-level consumer
>> (core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala)
>> and it looks like it empties some queues after a rebalance, but we
>> still see duplicate messages.
>>
>> I'm sure this question has been asked before :-) but this is our first
>> time really working with the high-level consumer, and this caught us
>> by surprise. When there is *no* data in Kafka, wait until everything
>> balances and then send data in everything works fine, but if there is
>> some data sitting in the brokers, we seems to get dupes, even when
>> each thread sleeps for many seconds after creating the
>> ConsumerConnector.
>>
>> Are we missing something?
>>
>> Thanks,
>>
>> Philip
>>

 
+
Jun Rao 2013-06-15, 15:57
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB