Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> 0.8 behavior change: consumer "re-receives" last batch of messages in a topic?


Copy link to this message
-
Re: 0.8 behavior change: consumer "re-receives" last batch of messages in a topic?
Thanks Neha,

I some how missed the 'nextOffset' when converting my logic. I'm assuming
the +1 trick works by chance and I shouldn't assume the next Offset is +1?

(It is minor for me to fix, I'm just curious where +1 might not work.)

Thanks,

Chris
On Wed, Mar 13, 2013 at 3:14 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> In 0.8, the iterator over the data returned in the FetchResponse is over
> MessageAndOffset. This class has a nextOffset() API, which is the offset of
> the next message in the message set. So, the nextOffset() value returned on
> the last message in the message should be used as the fetch offset in the
> following fetch() call to Kafka.
>
> Thanks,
> Neha
>
>
> On Wed, Mar 13, 2013 at 11:49 AM, Hargett, Phil <
> [EMAIL PROTECTED]> wrote:
>
> > I have 2 consumers in our scenario, reading from different brokers. Each
> > broker is running standalone, although each have their own dedicated
> > zookeeper instance for bookkeeping.
> >
> > After switching from 0.7.2, I noticed that both consumers exhibited high
> > CPU usage. I am not yet exploiting any zookeeper knowledge in my consumer
> > code; I am just making calls to the SimpleConsumer in the java API,
> passing
> > the host and port of my broker.
> >
> > In 0.7.2, I kept the last offset from messages received via a fetch, and
> > used that as the offset passed into the fetch method when receiving the
> > next message set.
> >
> > With 0.8, I had to add a check to drop fetched messages when the
> message's
> > offset was less than my own offset, based on the last message I saw. If I
> > didn't make that change, it seemed like the last 200 or so messages in my
> > topic  (probably matches a magic batch size configured somewhere in all
> of
> > this code) were continually refetched.
> >
> > In this scenario, my topic was no longer accumulating messages, as I had
> > turned off the producer, so I was expecting the fetches to eventually
> > either block, return an empty message set, or fail (not sure of semantics
> > of fetch). Continually receiving the last "batch" of messages at the end
> of
> > the topic was not a semantic I expected.
> >
> > Is this an intended change in behavior—or do I need to write better
> > consumer code?
> >
> > Guidance, please.
> >
> > :)
>