Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - High Level Consumer delivery semantics


Copy link to this message
-
Re: High Level Consumer delivery semantics
Guozhang Wang 2014-01-30, 00:00
Hi Clark,

1. This is true, you need to synchronize these consumer threads when
calling commitOffsets();

2. If you are asking what if the consumer thread crashed after

currentTopicInfo.resetConsumeOffset(consumedOffset)

within the next() call, then on its startup, it will lose all these
in-memory offsets, and read from the ZK which will be smaller than the
current value, still leading to duplicates but not data losses.

Guozhang
On Wed, Jan 29, 2014 at 12:31 PM, Clark Breyman <[EMAIL PROTECTED]> wrote:

> Guozhang,
>
> Thank make sense except for the following:
>
> - the ZookeeperConsumerConnector.commitOffsets() method commits the current
> value of PartitionTopicInfo.consumeOffset  for all of the active streams.
>
> - the ConsumerIterator in the streams advances the value of
> PartitionTopicInfo.consumeOffset *before* next() returns, not after the
> processing on that message is complete.
>
> If you have multiple threads consuming, thread A calling commitOffsets()
> may commit thread B's retrieved but unprocessed message, no?
>
>
> On Wed, Jan 29, 2014 at 12:20 PM, Guozhang Wang <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Clark,
> >
> > In practice, the client app code need to always commit offset after it
> has
> > processed the messages, and hence only the second case may happen,
> leading
> > to "at least once".
> >
> > Guozhang
> >
> >
> > On Wed, Jan 29, 2014 at 11:51 AM, Clark Breyman <[EMAIL PROTECTED]>
> wrote:
> >
> > > Wrestling through the at-least/most-once semantics of my application
> and
> > I
> > > was hoping for some confirmation of the semantics. I'm not sure I can
> > > classify the high level consumer as either  type.
> > >
> > > False ack scenario:
> > > - Thread A: call next() on the ConsumerIterator, advancing the
> > > PartitionTopicInfo offset
> > > - Thread B: commitOffsets() flushed offset of incomplete message to ZK
> > > - Thread A: fail processing (e.g. kill -9)
> > >
> > > False retry scenario:
> > > - Thread A: call next() & successfully process, kill -9 before
> > > commitOffsets either in thread or in parallel.
> > >
> > > Is this right or am I missing something (likely)? Seems like the
> > semantics
> > > are essentially approximately once.
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

--
-- Guozhang