Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> Consumer re-design proposal


+
Neha Narkhede 2012-06-11, 23:52
+
Jay Kreps 2012-06-12, 16:59
+
Evan Chan 2012-06-14, 20:39
Copy link to this message
-
Re: Consumer re-design proposal
Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary value, and then write that offset into ZK".

We're currently running into a scenario where we would like to have 100% reliability, and we're losing a few messages when a connection is broken, but there were still a few messages in the OS TCP buffers. So, we're planning on shifting the ZK offset by a few seconds "back in time" if we detect a broker has gone down, to make sure all the messages will be actually delivered to the end consumer when that broker comes back up, even if there's a small amount of overlapping messages.

Thanks,

Marcos
On Jun 14, 2012, at 2:39 PM, Evan Chan wrote:

> I would like to throw in a couple use cases:
>
>
>   - Allow the new consumer to reset its offset to either the current
>   largest or smallest.  This would be a great way to restart a process that
>   has fallen behind.  The only way I know how to do this today, with the
>   high-level consumer, is to delete the ZK nodes manually and restart the
>   consumer.
>   - Allow the consumer to reset its offset to some arbitrary value, and
>   then write that offset into ZK.    Kind of like the first case, but would
>   make rewinding/replays much easier.
>
> Modularity (the ability to layer the ZK infrastructure on top of the simple
> interface) would be great.
>
> thanks,
> Evan
>
>
> On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
>> This is a great summary Neha. It would be good to get people's feedback on
>> this since we don't want to keep breaking api and
>> protocol compatibility here, so the hope is to really get it right this
>> time now that we have really seen all the use cases and live with the
>> output for a while. I think the consumer design is a pretty hard protocol
>> and API design problem, so its fun to think about.
>>
>> If I were to summarize Neha's requirements list, I think there are three
>> high-level goals:
>>
>>  1. Simplify the consumer protocol to enable ease of development of
>>  consumer clients in other languages
>>  2. Try to replace the "simple consumer" and "high level consumer" with a
>>  single, general interface that has all the advantages of both.
>>  3. Support a bunch of use cases that either we didn't think of, or that
>>  weren't possible in the partitioning model of the pre-0.8 code base.
>>
>> -Jay
>>
>>
>> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED]
>>> wrote:
>>
>>> Hi,
>>>
>>> Over the past few months, we've received quite a lot of feedback on the
>>> consumer side features and design. Some of them are improvements to the
>>> current consumer design and some are simply new feature/API requests. I
>>> have attempted to write up the requirements that I've heard on this wiki
>> -
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
>>>
>>> This would involve some significant changes to the consumer APIs, so we
>>> would like to collect feedback on the proposal from our community. Since
>>> the list of changes is not small, we would like to understand if some
>>> features are preferred over others, and more importantly, if some
>> features
>>> are not required at all.
>>>
>>> Since some part of this proposal is experimental and the consumer side
>>> changes are non-trivial, we would like this initiative to not interfere
>>> with the forthcoming replication release. However, it will be good to
>> have
>>> people from the community give this some thought and help out with the
>>> JIRAs if interested. One way of managing this project could be creating a
>>> separate branch from the kafka trunk and continue development on it. Once
>>> it is ready and in good shape, we can think about cutting another release
>>> (after 0.8) for the releasing the new consumer API. Do people have
>>> preferences/concerns regarding creating a separate branch for this
>> project
>>> ?
>>>
>>> Please feel free to start a discussion on this JIRA -
>>> https
+
Neha Narkhede 2012-06-14, 21:53
+
Sybrandy, Casey 2012-06-18, 18:40
+
Chris Burroughs 2012-06-19, 02:00
+
Neha Narkhede 2012-06-19, 17:09
+
Dave Barr 2012-06-20, 05:49
+
Ross Black 2012-06-20, 13:30
+
Jun Rao 2012-06-14, 16:08
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB