Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?


Copy link to this message
-
Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?
If there are no offsets stored in ZK, I think it's possible to get some
dups during startup. Once the offsets are in ZK, there shouldn't be dups
during subsequent rebalances.

Thanks,

Jun
On Fri, Jun 14, 2013 at 2:04 PM, Philip O'Toole <[EMAIL PROTECTED]> wrote:

> On Thu, Jun 13, 2013 at 9:15 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > Are you messages compressed in batches? If so, some dups are expected
> > during rebalance. In 0.8, such dups are eliminated. Other than that,
> > rebalance shouldn't cause dups since we commit consumed offsets to ZK
> > before doing a rebalance.
>
> Jun -- quick clarification. Is this guarantee valid even if there is
> no state in Zookeeper? If the consumers that will rebalance are coming
> up for the *very first time*? I.e.:
>
> [zk: localhost:2181(CONNECTED) 1] ls /consumers
> Node does not exist: /consumers
> [zk: localhost:2181(CONNECTED) 2]
>
> Philip
>
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Jun 13, 2013 at 7:34 PM, Philip O'Toole <[EMAIL PROTECTED]>
> wrote:
> >
> >> Hello -- is it possible for our code to stall a ConsumerConnector from
> >> doing any consuming for, say, 30 seconds, until we can be sure that
> >> all other ConsumeConnectors are rebalanced?
> >>
> >> It seems that the first ConsumerConnector to come up is prefetching
> >> some data, and we end up with duplicate messages. We looked at the
> >> code for the high-level consumer
> >> (core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala)
> >> and it looks like it empties some queues after a rebalance, but we
> >> still see duplicate messages.
> >>
> >> I'm sure this question has been asked before :-) but this is our first
> >> time really working with the high-level consumer, and this caught us
> >> by surprise. When there is *no* data in Kafka, wait until everything
> >> balances and then send data in everything works fine, but if there is
> >> some data sitting in the brokers, we seems to get dupes, even when
> >> each thread sleeps for many seconds after creating the
> >> ConsumerConnector.
> >>
> >> Are we missing something?
> >>
> >> Thanks,
> >>
> >> Philip
> >>
>