Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> 0.8 behavior change: consumer "re-receives" last batch of messages in a topic?


Copy link to this message
-
Re: 0.8 behavior change: consumer "re-receives" last batch of messages in a topic?
Hi,

I noticed the same thing. In 0.8.0 the offset passed to the fetch is where
you want to start, not where you left off. So the last offset read from the
previous batch is truly the 'last offset' so you need to save it and ask
for it +1. Otherwise you keep asking for that last offset, which is valid
so it keeps returning.

Be careful with the +1 logic. Don't keep adding 1 if you don't get
anything. It should always be 'last offset read +1'

I think this happened with the change from file-byte offsets to offset as a
message #.

Chris
On Wed, Mar 13, 2013 at 2:49 PM, Hargett, Phil <
[EMAIL PROTECTED]> wrote:

> I have 2 consumers in our scenario, reading from different brokers. Each
> broker is running standalone, although each have their own dedicated
> zookeeper instance for bookkeeping.
>
> After switching from 0.7.2, I noticed that both consumers exhibited high
> CPU usage. I am not yet exploiting any zookeeper knowledge in my consumer
> code; I am just making calls to the SimpleConsumer in the java API, passing
> the host and port of my broker.
>
> In 0.7.2, I kept the last offset from messages received via a fetch, and
> used that as the offset passed into the fetch method when receiving the
> next message set.
>
> With 0.8, I had to add a check to drop fetched messages when the message's
> offset was less than my own offset, based on the last message I saw. If I
> didn't make that change, it seemed like the last 200 or so messages in my
> topic  (probably matches a magic batch size configured somewhere in all of
> this code) were continually refetched.
>
> In this scenario, my topic was no longer accumulating messages, as I had
> turned off the producer, so I was expecting the fetches to eventually
> either block, return an empty message set, or fail (not sure of semantics
> of fetch). Continually receiving the last "batch" of messages at the end of
> the topic was not a semantic I expected.
>
> Is this an intended change in behavior—or do I need to write better
> consumer code?
>
> Guidance, please.
>
> :)