Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary value, and then write that offset into ZK".
We're currently running into a scenario where we would like to have 100% reliability, and we're losing a few messages when a connection is broken, but there were still a few messages in the OS TCP buffers. So, we're planning on shifting the ZK offset by a few seconds "back in time" if we detect a broker has gone down, to make sure all the messages will be actually delivered to the end consumer when that broker comes back up, even if there's a small amount of overlapping messages.
On Jun 14, 2012, at 2:39 PM, Evan Chan wrote:
> I would like to throw in a couple use cases:
> - Allow the new consumer to reset its offset to either the current
> largest or smallest. This would be a great way to restart a process that
> has fallen behind. The only way I know how to do this today, with the
> high-level consumer, is to delete the ZK nodes manually and restart the
> - Allow the consumer to reset its offset to some arbitrary value, and
> then write that offset into ZK. Kind of like the first case, but would
> make rewinding/replays much easier.
> Modularity (the ability to layer the ZK infrastructure on top of the simple
> interface) would be great.
> On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>> This is a great summary Neha. It would be good to get people's feedback on
>> this since we don't want to keep breaking api and
>> protocol compatibility here, so the hope is to really get it right this
>> time now that we have really seen all the use cases and live with the
>> output for a while. I think the consumer design is a pretty hard protocol
>> and API design problem, so its fun to think about.
>> If I were to summarize Neha's requirements list, I think there are three
>> high-level goals:
>> 1. Simplify the consumer protocol to enable ease of development of
>> consumer clients in other languages
>> 2. Try to replace the "simple consumer" and "high level consumer" with a
>> single, general interface that has all the advantages of both.
>> 3. Support a bunch of use cases that either we didn't think of, or that
>> weren't possible in the partitioning model of the pre-0.8 code base.
>> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED]
>>> Over the past few months, we've received quite a lot of feedback on the
>>> consumer side features and design. Some of them are improvements to the
>>> current consumer design and some are simply new feature/API requests. I
>>> have attempted to write up the requirements that I've heard on this wiki
>>> This would involve some significant changes to the consumer APIs, so we
>>> would like to collect feedback on the proposal from our community. Since
>>> the list of changes is not small, we would like to understand if some
>>> features are preferred over others, and more importantly, if some
>>> are not required at all.
>>> Since some part of this proposal is experimental and the consumer side
>>> changes are non-trivial, we would like this initiative to not interfere
>>> with the forthcoming replication release. However, it will be good to
>>> people from the community give this some thought and help out with the
>>> JIRAs if interested. One way of managing this project could be creating a
>>> separate branch from the kafka trunk and continue development on it. Once
>>> it is ready and in good shape, we can think about cutting another release
>>> (after 0.8) for the releasing the new consumer API. Do people have
>>> preferences/concerns regarding creating a separate branch for this
>>> Please feel free to start a discussion on this JIRA -