Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> Re: Offset commit api


+
Jun Rao 2012-12-18, 16:06
+
Jay Kreps 2012-12-18, 16:23
+
Jay Kreps 2012-12-20, 21:04
+
David Arthur 2012-12-20, 21:31
+
Jay Kreps 2012-12-20, 22:04
+
Jay Kreps 2012-12-20, 22:05
+
David Arthur 2012-12-20, 22:09
+
Milind Parikh 2012-12-20, 22:15
Copy link to this message
-
Re: Offset commit api
Yeah, I just meant that as an example, it should be configurable.

-Jay
On Thu, Dec 20, 2012 at 2:15 PM, Milind Parikh <[EMAIL PROTECTED]>wrote:

> +1 on limiting the size. But could you do 2k instead of 1k? Using Interval
> Time Clocks gets you a lot on distributed autonomous processing; but most
> large scale ITCs go upto 1.5K.
>
> http://code.google.com/p/itclocks/    refer to the link on conference
> paper.
>
>
> Regards
> Milind
>
>
>
> On Thu, Dec 20, 2012 at 2:04 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > Err, to clarify, I meant punt on persisting the metadata not punt on
> > persisting the offset. Basically that field would be in the protocol but
> > would be unused in this phase.
> >
> > -Jay
> >
> >
> > On Thu, Dec 20, 2012 at 2:03 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> >
> > > I actually recommend we just punt on implementing persistence in zk
> > > entirely, otherwise we have to have an upgrade path to grandfather over
> > > existing zk data to the new format. Let's just add it in the API and
> only
> > > actually store it out when we redo the backend. We can handle the size
> > > limit then too.
> > >
> > > -Jay
> > >
> > >
> > > On Thu, Dec 20, 2012 at 1:30 PM, David Arthur <[EMAIL PROTECTED]>
> wrote:
> > >
> > >> No particular objection, though in order to support atomic writes of
> > >> (offset, metadata), we will need to define a protocol for the
> ZooKeeper
> > >> payloads. Something like:
> > >>
> > >>   OffsetPayload => Offset [Metadata]
> > >>   Metadata => length prefixed string
> > >>
> > >> should suffice. Otherwise we would have to rely on the multi-write
> > >> mechanism to keep parallel znodes in sync (I generally don't like
> things
> > >> like this).
> > >>
> > >> +1 for limiting the size (1kb sounds reasonable)
> > >>
> > >>
> > >> On 12/20/12 4:03 PM, Jay Kreps wrote:
> > >>
> > >>> Okay I did some assessment of use cases we have which aren't using
> the
> > >>> default offset storage API and came up with one generalization. I
> would
> > >>> like to propose--add a generic metadata field to the offset api on a
> > >>> per-partition basis. So that would leave us with the following:
> > >>>
> > >>> OffsetCommitRequest => ConsumerGroup [TopicName [Partition Offset
> > >>> Metadata]]
> > >>>
> > >>> OffsetFetchResponse => [TopicName [Partition Offset Metadata
> > ErrorCode]]
> > >>>
> > >>>    Metadata => string
> > >>>
> > >>> If you want to store a reference to any associated state (say an HDFS
> > >>> file
> > >>> name) so that if the consumption fails over the new consumer can
> start
> > up
> > >>> with the same state, this would be a place to store that. It would
> not
> > be
> > >>> intended to support large stuff (we could enforce a 1k limit or
> > >>> something,
> > >>> just something small or a reference on where to find the state (say a
> > >>> file
> > >>> name).
> > >>>
> > >>> Objections?
> > >>>
> > >>> -Jay
> > >>>
> > >>>
> > >>> On Mon, Dec 17, 2012 at 10:45 AM, Jay Kreps <[EMAIL PROTECTED]>
> > wrote:
> > >>>
> > >>>  Hey Guys,
> > >>>>
> > >>>> David has made a bunch of progress on the offset commit api
> > >>>> implementation.
> > >>>>
> > >>>> Since this is a public API it would be good to do as much thinking
> > >>>> up-front as possible to minimize future iterations.
> > >>>>
> > >>>> It would be great if folks could do the following:
> > >>>> 1. Read the wiki here:
> > >>>>
> > https://cwiki.apache.org/**confluence/display/KAFKA/**Offset+Management<
> > https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management>
> > >>>> 2. Check out the code David wrote here:
> > >>>> https://issues.apache.org/**jira/browse/KAFKA-657<
> > https://issues.apache.org/jira/browse/KAFKA-657>
> > >>>>
> > >>>> In particular our hope is that this API can act as the first step in
> > >>>> scaling the way we store offsets (ZK is not really very appropriate
> > for
> > >>>> this). This of course requires having some plan in mind for offset
> > >>>> storage.
> > >>>> I have written (and then after getting some initial feedback,

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB