Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> why was auto.offset.reset default changed (so late in the game)?


+
Jason Rosenberg 2013-06-18, 16:02
+
Jun Rao 2013-06-18, 19:50
+
Jason Rosenberg 2013-06-18, 20:28
Copy link to this message
-
Re: why was auto.offset.reset default changed (so late in the game)?
Jason,

If we default to smallest and a consumer doesn't override this, when it
migrates to 0.8, it will likely reconsume a lot of data. Quite of few
consumers are real time since they feed data to systems like Storm.

What do you think is the best way to communicate such config changes in the
future? It's discussed in the jira a bit. We can send it to the user
mailing list. However, I am not sure if everyone is paying attention to all
the emails.

Thanks

Jun

On Tue, Jun 18, 2013 at 1:28 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Jun,
>
> I just think it's a major change, that might have warranted a heads up, but
> it's all good (I've got things looking good with it now).  I simply changed
> the default behavior everywhere to explicitly initialize consumers with
> 'smallest' by default (whereas previously this was an ignored config
> value).
>
> I don't quite see the migration use case as rationale for the change.
>  Since in the steady state, after people are done migrating from 0.7 to
> 0.8, this default setting will be the norm for a long time to come.
>  Instead, it would make sense for people to use this mode explicitly when
> building a migration plan, etc.
>
> I wouldn't have thought that real-time consumers (which never care about
> past, stored data) would be the norm.  (It's not in our deployment anyway).
>
> The console consumer case probably makes no difference either way, since in
> that case you are usually starting with a fresh topic queue regardless (and
> there I think it's a better user experience to have messages delivered even
> if sent from the producer console, before the consumer console is fired up
> for the first time).
>
> It seems intuitive to me that by default, I should be able send a message
> to a topic, then consume that message from that topic (without explicitly
> having to set "auto.offset.reset" -> "smallest").
>
> Jason
>
>
> On Tue, Jun 18, 2013 at 12:49 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Hi, Jason,
> >
> > The reasons for this change are: (1) This is probably the most convenient
> > setting for people to migrate from 0.7 to 0.8. The process is to build an
> > 0.8 shadow cluster using our migration tool, upgrade all consumers to
> 0.8,
> > and finally upgrade all producers to 0.8. Since most consumers are likely
> > real time, when moving from 0.7 to 0.8, it's better for them to pick up
> the
> > latest offset in 0.8 so that they don't get too many duplicates (there
> > could be a small number of message loss for those consumers). (2) This
> > matches the default behavior of console consumer which is the first thing
> > that most new users experience. Does that make sense?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Jun 18, 2013 at 9:02 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
> >
> > > I'm wondering why the default setting for auto.offset.reset in the
> > > ConsumerConfig class was changed from 'smallest' to 'largest', so late
> in
> > > the game (looks like a commit on June 3 changed the default).  This is
> an
> > > extremely major change, I should think.  Consumers now by default only
> > get
> > > messages newer than when the consumer starts?
> > >
> > > What use cases are there for that?  I can think of one off cases, where
> > you
> > > just want to start consuming the latest feed, etc., to bootstrap
> things.
> > >  But in the normal case, where you want to take a consumer down for an
> > > update, and bring it back up, you'd always be losing messages in that
> > case.
> > >
> > > The default of the old (now renamed) "autooffset.reset" property in
> 0.7.2
> > > is "smallest", so this is a major change.
> > >
> > > Sadly, this one change caused me many hours of consternation with some
> > > broken tests (e.g. KAFKA-945).  I don't see any mention of this change
> > > listed in any messages to the group, etc.
> > >
> > > It might be good to have a configuration migration page outlining
> changes
> > > from 0.7.2.  This change is particularly difficult since it is in a

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB