Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> why was auto.offset.reset default changed (so late in the game)?


Copy link to this message
-
Re: why was auto.offset.reset default changed (so late in the game)?
Jun,

I just think it's a major change, that might have warranted a heads up, but
it's all good (I've got things looking good with it now).  I simply changed
the default behavior everywhere to explicitly initialize consumers with
'smallest' by default (whereas previously this was an ignored config value).

I don't quite see the migration use case as rationale for the change.
 Since in the steady state, after people are done migrating from 0.7 to
0.8, this default setting will be the norm for a long time to come.
 Instead, it would make sense for people to use this mode explicitly when
building a migration plan, etc.

I wouldn't have thought that real-time consumers (which never care about
past, stored data) would be the norm.  (It's not in our deployment anyway).

The console consumer case probably makes no difference either way, since in
that case you are usually starting with a fresh topic queue regardless (and
there I think it's a better user experience to have messages delivered even
if sent from the producer console, before the consumer console is fired up
for the first time).

It seems intuitive to me that by default, I should be able send a message
to a topic, then consume that message from that topic (without explicitly
having to set "auto.offset.reset" -> "smallest").

Jason
On Tue, Jun 18, 2013 at 12:49 PM, Jun Rao <[EMAIL PROTECTED]> wrote:

> Hi, Jason,
>
> The reasons for this change are: (1) This is probably the most convenient
> setting for people to migrate from 0.7 to 0.8. The process is to build an
> 0.8 shadow cluster using our migration tool, upgrade all consumers to 0.8,
> and finally upgrade all producers to 0.8. Since most consumers are likely
> real time, when moving from 0.7 to 0.8, it's better for them to pick up the
> latest offset in 0.8 so that they don't get too many duplicates (there
> could be a small number of message loss for those consumers). (2) This
> matches the default behavior of console consumer which is the first thing
> that most new users experience. Does that make sense?
>
> Thanks,
>
> Jun
>
>
> On Tue, Jun 18, 2013 at 9:02 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
>
> > I'm wondering why the default setting for auto.offset.reset in the
> > ConsumerConfig class was changed from 'smallest' to 'largest', so late in
> > the game (looks like a commit on June 3 changed the default).  This is an
> > extremely major change, I should think.  Consumers now by default only
> get
> > messages newer than when the consumer starts?
> >
> > What use cases are there for that?  I can think of one off cases, where
> you
> > just want to start consuming the latest feed, etc., to bootstrap things.
> >  But in the normal case, where you want to take a consumer down for an
> > update, and bring it back up, you'd always be losing messages in that
> case.
> >
> > The default of the old (now renamed) "autooffset.reset" property in 0.7.2
> > is "smallest", so this is a major change.
> >
> > Sadly, this one change caused me many hours of consternation with some
> > broken tests (e.g. KAFKA-945).  I don't see any mention of this change
> > listed in any messages to the group, etc.
> >
> > It might be good to have a configuration migration page outlining changes
> > from 0.7.2.  This change is particularly difficult since it is in a
> default
> > setting that in most cases people had been using the default (and now
> will
> > override the default in most cases).
> >
> > Jason
> >
>