Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> Re: Offset commit api


+
Jun Rao 2012-12-18, 16:06
+
Jay Kreps 2012-12-18, 16:23
+
Jay Kreps 2012-12-20, 21:04
+
David Arthur 2012-12-20, 21:31
+
Jay Kreps 2012-12-20, 22:04
+
Jay Kreps 2012-12-20, 22:05
Copy link to this message
-
Re: Offset commit api
Sounds good to me

On 12/20/12 5:04 PM, Jay Kreps wrote:
> Err, to clarify, I meant punt on persisting the metadata not punt on
> persisting the offset. Basically that field would be in the protocol but
> would be unused in this phase.
>
> -Jay
>
>
> On Thu, Dec 20, 2012 at 2:03 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
>> I actually recommend we just punt on implementing persistence in zk
>> entirely, otherwise we have to have an upgrade path to grandfather over
>> existing zk data to the new format. Let's just add it in the API and only
>> actually store it out when we redo the backend. We can handle the size
>> limit then too.
>>
>> -Jay
>>
>>
>> On Thu, Dec 20, 2012 at 1:30 PM, David Arthur <[EMAIL PROTECTED]> wrote:
>>
>>> No particular objection, though in order to support atomic writes of
>>> (offset, metadata), we will need to define a protocol for the ZooKeeper
>>> payloads. Something like:
>>>
>>>    OffsetPayload => Offset [Metadata]
>>>    Metadata => length prefixed string
>>>
>>> should suffice. Otherwise we would have to rely on the multi-write
>>> mechanism to keep parallel znodes in sync (I generally don't like things
>>> like this).
>>>
>>> +1 for limiting the size (1kb sounds reasonable)
>>>
>>>
>>> On 12/20/12 4:03 PM, Jay Kreps wrote:
>>>
>>>> Okay I did some assessment of use cases we have which aren't using the
>>>> default offset storage API and came up with one generalization. I would
>>>> like to propose--add a generic metadata field to the offset api on a
>>>> per-partition basis. So that would leave us with the following:
>>>>
>>>> OffsetCommitRequest => ConsumerGroup [TopicName [Partition Offset
>>>> Metadata]]
>>>>
>>>> OffsetFetchResponse => [TopicName [Partition Offset Metadata ErrorCode]]
>>>>
>>>>     Metadata => string
>>>>
>>>> If you want to store a reference to any associated state (say an HDFS
>>>> file
>>>> name) so that if the consumption fails over the new consumer can start up
>>>> with the same state, this would be a place to store that. It would not be
>>>> intended to support large stuff (we could enforce a 1k limit or
>>>> something,
>>>> just something small or a reference on where to find the state (say a
>>>> file
>>>> name).
>>>>
>>>> Objections?
>>>>
>>>> -Jay
>>>>
>>>>
>>>> On Mon, Dec 17, 2012 at 10:45 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>>>>
>>>>   Hey Guys,
>>>>> David has made a bunch of progress on the offset commit api
>>>>> implementation.
>>>>>
>>>>> Since this is a public API it would be good to do as much thinking
>>>>> up-front as possible to minimize future iterations.
>>>>>
>>>>> It would be great if folks could do the following:
>>>>> 1. Read the wiki here:
>>>>> https://cwiki.apache.org/**confluence/display/KAFKA/**Offset+Management<https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management>
>>>>> 2. Check out the code David wrote here:
>>>>> https://issues.apache.org/**jira/browse/KAFKA-657<https://issues.apache.org/jira/browse/KAFKA-657>
>>>>>
>>>>> In particular our hope is that this API can act as the first step in
>>>>> scaling the way we store offsets (ZK is not really very appropriate for
>>>>> this). This of course requires having some plan in mind for offset
>>>>> storage.
>>>>> I have written (and then after getting some initial feedback,
>>>>> rewritten) a
>>>>> section in the above wiki on how this might work.
>>>>>
>>>>> If no one says anything I will be taking a slightly modified patch that
>>>>> adds this functionality on trunk as soon as David gets in a few minor
>>>>> tweaks.
>>>>>
>>>>> -Jay
>>>>>
>>>>>
 
+
Milind Parikh 2012-12-20, 22:15
+
Jay Kreps 2012-12-20, 22:18