Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - Re: Offset commit api


Copy link to this message
-
Re: Offset commit api
Jay Kreps 2012-12-20, 22:18
Yeah, I just meant that as an example, it should be configurable.

-Jay
On Thu, Dec 20, 2012 at 2:15 PM, Milind Parikh <[EMAIL PROTECTED]>wrote:

> +1 on limiting the size. But could you do 2k instead of 1k? Using Interval
> Time Clocks gets you a lot on distributed autonomous processing; but most
> large scale ITCs go upto 1.5K.
>
> http://code.google.com/p/itclocks/    refer to the link on conference
> paper.
>
>
> Regards
> Milind
>
>
>
> On Thu, Dec 20, 2012 at 2:04 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > Err, to clarify, I meant punt on persisting the metadata not punt on
> > persisting the offset. Basically that field would be in the protocol but
> > would be unused in this phase.
> >
> > -Jay
> >
> >
> > On Thu, Dec 20, 2012 at 2:03 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> >
> > > I actually recommend we just punt on implementing persistence in zk
> > > entirely, otherwise we have to have an upgrade path to grandfather over
> > > existing zk data to the new format. Let's just add it in the API and
> only
> > > actually store it out when we redo the backend. We can handle the size
> > > limit then too.
> > >
> > > -Jay
> > >
> > >
> > > On Thu, Dec 20, 2012 at 1:30 PM, David Arthur <[EMAIL PROTECTED]>
> wrote:
> > >
> > >> No particular objection, though in order to support atomic writes of
> > >> (offset, metadata), we will need to define a protocol for the
> ZooKeeper
> > >> payloads. Something like:
> > >>
> > >>   OffsetPayload => Offset [Metadata]
> > >>   Metadata => length prefixed string
> > >>
> > >> should suffice. Otherwise we would have to rely on the multi-write
> > >> mechanism to keep parallel znodes in sync (I generally don't like
> things
> > >> like this).
> > >>
> > >> +1 for limiting the size (1kb sounds reasonable)
> > >>
> > >>
> > >> On 12/20/12 4:03 PM, Jay Kreps wrote:
> > >>
> > >>> Okay I did some assessment of use cases we have which aren't using
> the
> > >>> default offset storage API and came up with one generalization. I
> would
> > >>> like to propose--add a generic metadata field to the offset api on a
> > >>> per-partition basis. So that would leave us with the following:
> > >>>
> > >>> OffsetCommitRequest => ConsumerGroup [TopicName [Partition Offset
> > >>> Metadata]]
> > >>>
> > >>> OffsetFetchResponse => [TopicName [Partition Offset Metadata
> > ErrorCode]]
> > >>>
> > >>>    Metadata => string
> > >>>
> > >>> If you want to store a reference to any associated state (say an HDFS
> > >>> file
> > >>> name) so that if the consumption fails over the new consumer can
> start
> > up
> > >>> with the same state, this would be a place to store that. It would
> not
> > be
> > >>> intended to support large stuff (we could enforce a 1k limit or
> > >>> something,
> > >>> just something small or a reference on where to find the state (say a
> > >>> file
> > >>> name).
> > >>>
> > >>> Objections?
> > >>>
> > >>> -Jay
> > >>>
> > >>>
> > >>> On Mon, Dec 17, 2012 at 10:45 AM, Jay Kreps <[EMAIL PROTECTED]>
> > wrote:
> > >>>
> > >>>  Hey Guys,
> > >>>>
> > >>>> David has made a bunch of progress on the offset commit api
> > >>>> implementation.
> > >>>>
> > >>>> Since this is a public API it would be good to do as much thinking
> > >>>> up-front as possible to minimize future iterations.
> > >>>>
> > >>>> It would be great if folks could do the following:
> > >>>> 1. Read the wiki here:
> > >>>>
> > https://cwiki.apache.org/**confluence/display/KAFKA/**Offset+Management<
> > https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management>
> > >>>> 2. Check out the code David wrote here:
> > >>>> https://issues.apache.org/**jira/browse/KAFKA-657<
> > https://issues.apache.org/jira/browse/KAFKA-657>
> > >>>>
> > >>>> In particular our hope is that this API can act as the first step in
> > >>>> scaling the way we store offsets (ZK is not really very appropriate
> > for
> > >>>> this). This of course requires having some plan in mind for offset
> > >>>> storage.
> > >>>> I have written (and then after getting some initial feedback,