|
|
-
Re: Offset commit apiDavid Arthur 2012-12-20, 22:09
Sounds good to me
On 12/20/12 5:04 PM, Jay Kreps wrote: > Err, to clarify, I meant punt on persisting the metadata not punt on > persisting the offset. Basically that field would be in the protocol but > would be unused in this phase. > > -Jay > > > On Thu, Dec 20, 2012 at 2:03 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > >> I actually recommend we just punt on implementing persistence in zk >> entirely, otherwise we have to have an upgrade path to grandfather over >> existing zk data to the new format. Let's just add it in the API and only >> actually store it out when we redo the backend. We can handle the size >> limit then too. >> >> -Jay >> >> >> On Thu, Dec 20, 2012 at 1:30 PM, David Arthur <[EMAIL PROTECTED]> wrote: >> >>> No particular objection, though in order to support atomic writes of >>> (offset, metadata), we will need to define a protocol for the ZooKeeper >>> payloads. Something like: >>> >>> OffsetPayload => Offset [Metadata] >>> Metadata => length prefixed string >>> >>> should suffice. Otherwise we would have to rely on the multi-write >>> mechanism to keep parallel znodes in sync (I generally don't like things >>> like this). >>> >>> +1 for limiting the size (1kb sounds reasonable) >>> >>> >>> On 12/20/12 4:03 PM, Jay Kreps wrote: >>> >>>> Okay I did some assessment of use cases we have which aren't using the >>>> default offset storage API and came up with one generalization. I would >>>> like to propose--add a generic metadata field to the offset api on a >>>> per-partition basis. So that would leave us with the following: >>>> >>>> OffsetCommitRequest => ConsumerGroup [TopicName [Partition Offset >>>> Metadata]] >>>> >>>> OffsetFetchResponse => [TopicName [Partition Offset Metadata ErrorCode]] >>>> >>>> Metadata => string >>>> >>>> If you want to store a reference to any associated state (say an HDFS >>>> file >>>> name) so that if the consumption fails over the new consumer can start up >>>> with the same state, this would be a place to store that. It would not be >>>> intended to support large stuff (we could enforce a 1k limit or >>>> something, >>>> just something small or a reference on where to find the state (say a >>>> file >>>> name). >>>> >>>> Objections? >>>> >>>> -Jay >>>> >>>> >>>> On Mon, Dec 17, 2012 at 10:45 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: >>>> >>>> Hey Guys, >>>>> David has made a bunch of progress on the offset commit api >>>>> implementation. >>>>> >>>>> Since this is a public API it would be good to do as much thinking >>>>> up-front as possible to minimize future iterations. >>>>> >>>>> It would be great if folks could do the following: >>>>> 1. Read the wiki here: >>>>> https://cwiki.apache.org/**confluence/display/KAFKA/**Offset+Management<https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management> >>>>> 2. Check out the code David wrote here: >>>>> https://issues.apache.org/**jira/browse/KAFKA-657<https://issues.apache.org/jira/browse/KAFKA-657> >>>>> >>>>> In particular our hope is that this API can act as the first step in >>>>> scaling the way we store offsets (ZK is not really very appropriate for >>>>> this). This of course requires having some plan in mind for offset >>>>> storage. >>>>> I have written (and then after getting some initial feedback, >>>>> rewritten) a >>>>> section in the above wiki on how this might work. >>>>> >>>>> If no one says anything I will be taking a slightly modified patch that >>>>> adds this functionality on trunk as soon as David gets in a few minor >>>>> tweaks. >>>>> >>>>> -Jay >>>>> >>>>> |