Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - 0.8 behavior change: consumer "re-receives" last batch of messages in a topic?


Copy link to this message
-
Re: 0.8 behavior change: consumer "re-receives" last batch of messages in a topic?
Neha Narkhede 2013-03-13, 19:58
+1 works every time. We provided a nextOffset() API for convenience just in
case this changes in the future.

Thanks,
Neha
On Wed, Mar 13, 2013 at 12:42 PM, Chris Curtin <[EMAIL PROTECTED]>wrote:

> Thanks Neha,
>
> I some how missed the 'nextOffset' when converting my logic. I'm assuming
> the +1 trick works by chance and I shouldn't assume the next Offset is +1?
>
> (It is minor for me to fix, I'm just curious where +1 might not work.)
>
> Thanks,
>
> Chris
>
>
> On Wed, Mar 13, 2013 at 3:14 PM, Neha Narkhede <[EMAIL PROTECTED]
> >wrote:
>
> > In 0.8, the iterator over the data returned in the FetchResponse is over
> > MessageAndOffset. This class has a nextOffset() API, which is the offset
> of
> > the next message in the message set. So, the nextOffset() value returned
> on
> > the last message in the message should be used as the fetch offset in the
> > following fetch() call to Kafka.
> >
> > Thanks,
> > Neha
> >
> >
> > On Wed, Mar 13, 2013 at 11:49 AM, Hargett, Phil <
> > [EMAIL PROTECTED]> wrote:
> >
> > > I have 2 consumers in our scenario, reading from different brokers.
> Each
> > > broker is running standalone, although each have their own dedicated
> > > zookeeper instance for bookkeeping.
> > >
> > > After switching from 0.7.2, I noticed that both consumers exhibited
> high
> > > CPU usage. I am not yet exploiting any zookeeper knowledge in my
> consumer
> > > code; I am just making calls to the SimpleConsumer in the java API,
> > passing
> > > the host and port of my broker.
> > >
> > > In 0.7.2, I kept the last offset from messages received via a fetch,
> and
> > > used that as the offset passed into the fetch method when receiving the
> > > next message set.
> > >
> > > With 0.8, I had to add a check to drop fetched messages when the
> > message's
> > > offset was less than my own offset, based on the last message I saw.
> If I
> > > didn't make that change, it seemed like the last 200 or so messages in
> my
> > > topic  (probably matches a magic batch size configured somewhere in all
> > of
> > > this code) were continually refetched.
> > >
> > > In this scenario, my topic was no longer accumulating messages, as I
> had
> > > turned off the producer, so I was expecting the fetches to eventually
> > > either block, return an empty message set, or fail (not sure of
> semantics
> > > of fetch). Continually receiving the last "batch" of messages at the
> end
> > of
> > > the topic was not a semantic I expected.
> > >
> > > Is this an intended change in behavior—or do I need to write better
> > > consumer code?
> > >
> > > Guidance, please.
> > >
> > > :)
> >
>