Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - Re: Offset commit api


Copy link to this message
-
Re: Offset commit api
Jay Kreps 2012-12-20, 22:05
Err, to clarify, I meant punt on persisting the metadata not punt on
persisting the offset. Basically that field would be in the protocol but
would be unused in this phase.

-Jay
On Thu, Dec 20, 2012 at 2:03 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> I actually recommend we just punt on implementing persistence in zk
> entirely, otherwise we have to have an upgrade path to grandfather over
> existing zk data to the new format. Let's just add it in the API and only
> actually store it out when we redo the backend. We can handle the size
> limit then too.
>
> -Jay
>
>
> On Thu, Dec 20, 2012 at 1:30 PM, David Arthur <[EMAIL PROTECTED]> wrote:
>
>> No particular objection, though in order to support atomic writes of
>> (offset, metadata), we will need to define a protocol for the ZooKeeper
>> payloads. Something like:
>>
>>   OffsetPayload => Offset [Metadata]
>>   Metadata => length prefixed string
>>
>> should suffice. Otherwise we would have to rely on the multi-write
>> mechanism to keep parallel znodes in sync (I generally don't like things
>> like this).
>>
>> +1 for limiting the size (1kb sounds reasonable)
>>
>>
>> On 12/20/12 4:03 PM, Jay Kreps wrote:
>>
>>> Okay I did some assessment of use cases we have which aren't using the
>>> default offset storage API and came up with one generalization. I would
>>> like to propose--add a generic metadata field to the offset api on a
>>> per-partition basis. So that would leave us with the following:
>>>
>>> OffsetCommitRequest => ConsumerGroup [TopicName [Partition Offset
>>> Metadata]]
>>>
>>> OffsetFetchResponse => [TopicName [Partition Offset Metadata ErrorCode]]
>>>
>>>    Metadata => string
>>>
>>> If you want to store a reference to any associated state (say an HDFS
>>> file
>>> name) so that if the consumption fails over the new consumer can start up
>>> with the same state, this would be a place to store that. It would not be
>>> intended to support large stuff (we could enforce a 1k limit or
>>> something,
>>> just something small or a reference on where to find the state (say a
>>> file
>>> name).
>>>
>>> Objections?
>>>
>>> -Jay
>>>
>>>
>>> On Mon, Dec 17, 2012 at 10:45 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>>>
>>>  Hey Guys,
>>>>
>>>> David has made a bunch of progress on the offset commit api
>>>> implementation.
>>>>
>>>> Since this is a public API it would be good to do as much thinking
>>>> up-front as possible to minimize future iterations.
>>>>
>>>> It would be great if folks could do the following:
>>>> 1. Read the wiki here:
>>>> https://cwiki.apache.org/**confluence/display/KAFKA/**Offset+Management<https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management>
>>>> 2. Check out the code David wrote here:
>>>> https://issues.apache.org/**jira/browse/KAFKA-657<https://issues.apache.org/jira/browse/KAFKA-657>
>>>>
>>>> In particular our hope is that this API can act as the first step in
>>>> scaling the way we store offsets (ZK is not really very appropriate for
>>>> this). This of course requires having some plan in mind for offset
>>>> storage.
>>>> I have written (and then after getting some initial feedback,
>>>> rewritten) a
>>>> section in the above wiki on how this might work.
>>>>
>>>> If no one says anything I will be taking a slightly modified patch that
>>>> adds this functionality on trunk as soon as David gets in a few minor
>>>> tweaks.
>>>>
>>>> -Jay
>>>>
>>>>
>>
>