|
|
-
Consumer re-design proposal
Neha Narkhede 2012-06-11, 23:52
Hi, Over the past few months, we've received quite a lot of feedback on the consumer side features and design. Some of them are improvements to the current consumer design and some are simply new feature/API requests. I have attempted to write up the requirements that I've heard on this wiki - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-DesignThis would involve some significant changes to the consumer APIs, so we would like to collect feedback on the proposal from our community. Since the list of changes is not small, we would like to understand if some features are preferred over others, and more importantly, if some features are not required at all. Since some part of this proposal is experimental and the consumer side changes are non-trivial, we would like this initiative to not interfere with the forthcoming replication release. However, it will be good to have people from the community give this some thought and help out with the JIRAs if interested. One way of managing this project could be creating a separate branch from the kafka trunk and continue development on it. Once it is ready and in good shape, we can think about cutting another release (after 0.8) for the releasing the new consumer API. Do people have preferences/concerns regarding creating a separate branch for this project ? Please feel free to start a discussion on this JIRA - https://issues.apache.org/jira/browse/KAFKA-364Thanks, Neha
+
Neha Narkhede 2012-06-11, 23:52
-
Re: Consumer re-design proposal
Jay Kreps 2012-06-12, 16:59
This is a great summary Neha. It would be good to get people's feedback on this since we don't want to keep breaking api and protocol compatibility here, so the hope is to really get it right this time now that we have really seen all the use cases and live with the output for a while. I think the consumer design is a pretty hard protocol and API design problem, so its fun to think about. If I were to summarize Neha's requirements list, I think there are three high-level goals: 1. Simplify the consumer protocol to enable ease of development of consumer clients in other languages 2. Try to replace the "simple consumer" and "high level consumer" with a single, general interface that has all the advantages of both. 3. Support a bunch of use cases that either we didn't think of, or that weren't possible in the partitioning model of the pre-0.8 code base. -Jay On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote: > Hi, > > Over the past few months, we've received quite a lot of feedback on the > consumer side features and design. Some of them are improvements to the > current consumer design and some are simply new feature/API requests. I > have attempted to write up the requirements that I've heard on this wiki - > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design> > This would involve some significant changes to the consumer APIs, so we > would like to collect feedback on the proposal from our community. Since > the list of changes is not small, we would like to understand if some > features are preferred over others, and more importantly, if some features > are not required at all. > > Since some part of this proposal is experimental and the consumer side > changes are non-trivial, we would like this initiative to not interfere > with the forthcoming replication release. However, it will be good to have > people from the community give this some thought and help out with the > JIRAs if interested. One way of managing this project could be creating a > separate branch from the kafka trunk and continue development on it. Once > it is ready and in good shape, we can think about cutting another release > (after 0.8) for the releasing the new consumer API. Do people have > preferences/concerns regarding creating a separate branch for this project > ? > > Please feel free to start a discussion on this JIRA - > https://issues.apache.org/jira/browse/KAFKA-364> > Thanks, > > Neha >
+
Jay Kreps 2012-06-12, 16:59
-
Re: Consumer re-design proposal
Evan Chan 2012-06-14, 20:39
I would like to throw in a couple use cases: - Allow the new consumer to reset its offset to either the current largest or smallest. This would be a great way to restart a process that has fallen behind. The only way I know how to do this today, with the high-level consumer, is to delete the ZK nodes manually and restart the consumer. - Allow the consumer to reset its offset to some arbitrary value, and then write that offset into ZK. Kind of like the first case, but would make rewinding/replays much easier. Modularity (the ability to layer the ZK infrastructure on top of the simple interface) would be great. thanks, Evan On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > This is a great summary Neha. It would be good to get people's feedback on > this since we don't want to keep breaking api and > protocol compatibility here, so the hope is to really get it right this > time now that we have really seen all the use cases and live with the > output for a while. I think the consumer design is a pretty hard protocol > and API design problem, so its fun to think about. > > If I were to summarize Neha's requirements list, I think there are three > high-level goals: > > 1. Simplify the consumer protocol to enable ease of development of > consumer clients in other languages > 2. Try to replace the "simple consumer" and "high level consumer" with a > single, general interface that has all the advantages of both. > 3. Support a bunch of use cases that either we didn't think of, or that > weren't possible in the partitioning model of the pre-0.8 code base. > > -Jay > > > On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED] > >wrote: > > > Hi, > > > > Over the past few months, we've received quite a lot of feedback on the > > consumer side features and design. Some of them are improvements to the > > current consumer design and some are simply new feature/API requests. I > > have attempted to write up the requirements that I've heard on this wiki > - > > > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design> > > > This would involve some significant changes to the consumer APIs, so we > > would like to collect feedback on the proposal from our community. Since > > the list of changes is not small, we would like to understand if some > > features are preferred over others, and more importantly, if some > features > > are not required at all. > > > > Since some part of this proposal is experimental and the consumer side > > changes are non-trivial, we would like this initiative to not interfere > > with the forthcoming replication release. However, it will be good to > have > > people from the community give this some thought and help out with the > > JIRAs if interested. One way of managing this project could be creating a > > separate branch from the kafka trunk and continue development on it. Once > > it is ready and in good shape, we can think about cutting another release > > (after 0.8) for the releasing the new consumer API. Do people have > > preferences/concerns regarding creating a separate branch for this > project > > ? > > > > Please feel free to start a discussion on this JIRA - > > https://issues.apache.org/jira/browse/KAFKA-364> > > > Thanks, > > > > Neha > > > -- -- *Evan Chan* Senior Software Engineer | [EMAIL PROTECTED] | (650) 996-4600 www.ooyala.com | blog < http://www.ooyala.com/blog> | @ooyala< http://www.twitter.com/ooyala>
+
Evan Chan 2012-06-14, 20:39
-
Re: Consumer re-design proposal
Marcos Juarez 2012-06-14, 21:45
Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary value, and then write that offset into ZK". We're currently running into a scenario where we would like to have 100% reliability, and we're losing a few messages when a connection is broken, but there were still a few messages in the OS TCP buffers. So, we're planning on shifting the ZK offset by a few seconds "back in time" if we detect a broker has gone down, to make sure all the messages will be actually delivered to the end consumer when that broker comes back up, even if there's a small amount of overlapping messages. Thanks, Marcos On Jun 14, 2012, at 2:39 PM, Evan Chan wrote: > I would like to throw in a couple use cases: > > > - Allow the new consumer to reset its offset to either the current > largest or smallest. This would be a great way to restart a process that > has fallen behind. The only way I know how to do this today, with the > high-level consumer, is to delete the ZK nodes manually and restart the > consumer. > - Allow the consumer to reset its offset to some arbitrary value, and > then write that offset into ZK. Kind of like the first case, but would > make rewinding/replays much easier. > > Modularity (the ability to layer the ZK infrastructure on top of the simple > interface) would be great. > > thanks, > Evan > > > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > >> This is a great summary Neha. It would be good to get people's feedback on >> this since we don't want to keep breaking api and >> protocol compatibility here, so the hope is to really get it right this >> time now that we have really seen all the use cases and live with the >> output for a while. I think the consumer design is a pretty hard protocol >> and API design problem, so its fun to think about. >> >> If I were to summarize Neha's requirements list, I think there are three >> high-level goals: >> >> 1. Simplify the consumer protocol to enable ease of development of >> consumer clients in other languages >> 2. Try to replace the "simple consumer" and "high level consumer" with a >> single, general interface that has all the advantages of both. >> 3. Support a bunch of use cases that either we didn't think of, or that >> weren't possible in the partitioning model of the pre-0.8 code base. >> >> -Jay >> >> >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED] >>> wrote: >> >>> Hi, >>> >>> Over the past few months, we've received quite a lot of feedback on the >>> consumer side features and design. Some of them are improvements to the >>> current consumer design and some are simply new feature/API requests. I >>> have attempted to write up the requirements that I've heard on this wiki >> - >>> >> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design>>> >>> This would involve some significant changes to the consumer APIs, so we >>> would like to collect feedback on the proposal from our community. Since >>> the list of changes is not small, we would like to understand if some >>> features are preferred over others, and more importantly, if some >> features >>> are not required at all. >>> >>> Since some part of this proposal is experimental and the consumer side >>> changes are non-trivial, we would like this initiative to not interfere >>> with the forthcoming replication release. However, it will be good to >> have >>> people from the community give this some thought and help out with the >>> JIRAs if interested. One way of managing this project could be creating a >>> separate branch from the kafka trunk and continue development on it. Once >>> it is ready and in good shape, we can think about cutting another release >>> (after 0.8) for the releasing the new consumer API. Do people have >>> preferences/concerns regarding creating a separate branch for this >> project >>> ? >>> >>> Please feel free to start a discussion on this JIRA - >>> https
+
Marcos Juarez 2012-06-14, 21:45
-
Re: Consumer re-design proposal
Neha Narkhede 2012-06-14, 21:53
Thanks for the feedback ! I moved it to https://issues.apache.org/jira/browse/KAFKA-364, so that we can keep track of these. -Neha On Thu, Jun 14, 2012 at 2:45 PM, Marcos Juarez <[EMAIL PROTECTED]> wrote: > Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary > value, and then write that offset into ZK". > > We're currently running into a scenario where we would like to have 100% > reliability, and we're losing a few messages when a connection is broken, > but there were still a few messages in the OS TCP buffers. So, we're > planning on shifting the ZK offset by a few seconds "back in time" if we > detect a broker has gone down, to make sure all the messages will be > actually delivered to the end consumer when that broker comes back up, even > if there's a small amount of overlapping messages. > > Thanks, > > Marcos > > > On Jun 14, 2012, at 2:39 PM, Evan Chan wrote: > > > I would like to throw in a couple use cases: > > > > > > - Allow the new consumer to reset its offset to either the current > > largest or smallest. This would be a great way to restart a process > that > > has fallen behind. The only way I know how to do this today, with the > > high-level consumer, is to delete the ZK nodes manually and restart the > > consumer. > > - Allow the consumer to reset its offset to some arbitrary value, and > > then write that offset into ZK. Kind of like the first case, but > would > > make rewinding/replays much easier. > > > > Modularity (the ability to layer the ZK infrastructure on top of the > simple > > interface) would be great. > > > > thanks, > > Evan > > > > > > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > > >> This is a great summary Neha. It would be good to get people's feedback > on > >> this since we don't want to keep breaking api and > >> protocol compatibility here, so the hope is to really get it right this > >> time now that we have really seen all the use cases and live with the > >> output for a while. I think the consumer design is a pretty hard > protocol > >> and API design problem, so its fun to think about. > >> > >> If I were to summarize Neha's requirements list, I think there are three > >> high-level goals: > >> > >> 1. Simplify the consumer protocol to enable ease of development of > >> consumer clients in other languages > >> 2. Try to replace the "simple consumer" and "high level consumer" with > a > >> single, general interface that has all the advantages of both. > >> 3. Support a bunch of use cases that either we didn't think of, or that > >> weren't possible in the partitioning model of the pre-0.8 code base. > >> > >> -Jay > >> > >> > >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED] > >>> wrote: > >> > >>> Hi, > >>> > >>> Over the past few months, we've received quite a lot of feedback on the > >>> consumer side features and design. Some of them are improvements to the > >>> current consumer design and some are simply new feature/API requests. I > >>> have attempted to write up the requirements that I've heard on this > wiki > >> - > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design> >>> > >>> This would involve some significant changes to the consumer APIs, so we > >>> would like to collect feedback on the proposal from our community. > Since > >>> the list of changes is not small, we would like to understand if some > >>> features are preferred over others, and more importantly, if some > >> features > >>> are not required at all. > >>> > >>> Since some part of this proposal is experimental and the consumer side > >>> changes are non-trivial, we would like this initiative to not interfere > >>> with the forthcoming replication release. However, it will be good to > >> have > >>> people from the community give this some thought and help out with the > >>> JIRAs if interested. One way of managing this project could be > creating a > >>> separate branch from the kafka trunk and continue development on it.
+
Neha Narkhede 2012-06-14, 21:53
-
RE: Consumer re-design proposal
Sybrandy, Casey 2012-06-18, 18:40
Would porting the consumer/producer code to C be a good idea? I say this because at least with most languages I know of, leveraging a C library is pretty easy. This way, you would have to maintain only the C library and others can make/maintain wrappers for their languages. Having to port to other languages is going to cause you to have a significant amount of maintenance if you change the protocol in the future. ________________________________________ From: Neha Narkhede [[EMAIL PROTECTED]] Sent: Thursday, June 14, 2012 5:53 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Consumer re-design proposal Thanks for the feedback ! I moved it to https://issues.apache.org/jira/browse/KAFKA-364, so that we can keep track of these. -Neha On Thu, Jun 14, 2012 at 2:45 PM, Marcos Juarez <[EMAIL PROTECTED]> wrote: > Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary > value, and then write that offset into ZK". > > We're currently running into a scenario where we would like to have 100% > reliability, and we're losing a few messages when a connection is broken, > but there were still a few messages in the OS TCP buffers. So, we're > planning on shifting the ZK offset by a few seconds "back in time" if we > detect a broker has gone down, to make sure all the messages will be > actually delivered to the end consumer when that broker comes back up, even > if there's a small amount of overlapping messages. > > Thanks, > > Marcos > > > On Jun 14, 2012, at 2:39 PM, Evan Chan wrote: > > > I would like to throw in a couple use cases: > > > > > > - Allow the new consumer to reset its offset to either the current > > largest or smallest. This would be a great way to restart a process > that > > has fallen behind. The only way I know how to do this today, with the > > high-level consumer, is to delete the ZK nodes manually and restart the > > consumer. > > - Allow the consumer to reset its offset to some arbitrary value, and > > then write that offset into ZK. Kind of like the first case, but > would > > make rewinding/replays much easier. > > > > Modularity (the ability to layer the ZK infrastructure on top of the > simple > > interface) would be great. > > > > thanks, > > Evan > > > > > > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > > >> This is a great summary Neha. It would be good to get people's feedback > on > >> this since we don't want to keep breaking api and > >> protocol compatibility here, so the hope is to really get it right this > >> time now that we have really seen all the use cases and live with the > >> output for a while. I think the consumer design is a pretty hard > protocol > >> and API design problem, so its fun to think about. > >> > >> If I were to summarize Neha's requirements list, I think there are three > >> high-level goals: > >> > >> 1. Simplify the consumer protocol to enable ease of development of > >> consumer clients in other languages > >> 2. Try to replace the "simple consumer" and "high level consumer" with > a > >> single, general interface that has all the advantages of both. > >> 3. Support a bunch of use cases that either we didn't think of, or that > >> weren't possible in the partitioning model of the pre-0.8 code base. > >> > >> -Jay > >> > >> > >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED] > >>> wrote: > >> > >>> Hi, > >>> > >>> Over the past few months, we've received quite a lot of feedback on the > >>> consumer side features and design. Some of them are improvements to the > >>> current consumer design and some are simply new feature/API requests. I > >>> have attempted to write up the requirements that I've heard on this > wiki > >> - > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design> >>> > >>> This would involve some significant changes to the consumer APIs, so we > >>> would like to collect feedback on the proposal from our community.
+
Sybrandy, Casey 2012-06-18, 18:40
-
Re: Consumer re-design proposal
Chris Burroughs 2012-06-19, 02:00
On 06/12/2012 12:59 PM, Jay Kreps wrote: > 2. Try to replace the "simple consumer" and "high level consumer" with a > single, general interface that has all the advantages of both.
I've read through the wiki pages but think I'm missing the forrest for the trees.
For a consumer that wants "Manual partition assignment" and "Manual offset management", what does the proposed offer over the existing SimpleConsumer?
+
Chris Burroughs 2012-06-19, 02:00
-
Re: Consumer re-design proposal
Neha Narkhede 2012-06-19, 17:09
Chris,
One of the goals of thinning the Kafka consumer client is removing the zookeeper client from the consumer. Without this, Kafka consumer client would depend on the stability of a zookeeper client.
>> For a consumer that wants "Manual partition assignment" and "Manual offset management", what does the proposed offer over the existing SimpleConsumer?
Right now, we have 2 Kafka consumer clients, some functionality is possible in one but not the other. Some users have requested features that would require some combination of the functionalities offered by the two consumer clients. We think it might be a good idea to collect feedback and try to design a single consumer client API that satisfies these requirements. But it's unclear if this is quite the right solution.
We will be writing up some concrete API/protocol proposal soon. I will send it around for more detailed feedback.
Thanks, Neha
On Mon, Jun 18, 2012 at 7:00 PM, Chris Burroughs <[EMAIL PROTECTED]> wrote: > On 06/12/2012 12:59 PM, Jay Kreps wrote: >> 2. Try to replace the "simple consumer" and "high level consumer" with a >> single, general interface that has all the advantages of both. > > I've read through the wiki pages but think I'm missing the forrest for > the trees. > > For a consumer that wants "Manual partition assignment" and "Manual > offset management", what does the proposed offer over the existing > SimpleConsumer?
+
Neha Narkhede 2012-06-19, 17:09
-
Re: Consumer re-design proposal
Dave Barr 2012-06-20, 05:49
On Tue, Jun 19, 2012 at 10:09 AM, Neha Narkhede <[EMAIL PROTECTED]> wrote: > One of the goals of thinning the Kafka consumer client is removing the > zookeeper client from the consumer. Without this, Kafka consumer > client would depend on the stability of a zookeeper client.
If there's a stability issue with the zookeeper client, then that should be addressed.
ZK is a fine tool for service discovery and coordination. It seems like any new system that forced me, as a consumer, to use yet another system to bootstrap and discover where my brokers are for a topic would be a step backward.
I'm curious, why, specifically, is removing ZK a design goal (especially when it's such a core component of the broker)? I think of other projects, like HBase, which seem to have no issue with using ZK in their client.
--Dave
+
Dave Barr 2012-06-20, 05:49
-
Re: Consumer re-design proposal
Ross Black 2012-06-20, 13:30
I added a couple of comments to the issue https://issues.apache.org/jira/browse/KAFKA-364(I was not certain whether you wanted comments on the mailing list, the wiki page, or the issue?) Thanks, Ross
+
Ross Black 2012-06-20, 13:30
-
Re: Consumer re-design proposal
Jun Rao 2012-06-14, 16:08
If nobody objects, we can create a separate consumer redesign branch. This way, everyone can see the changes and progress. Thanks, Jun On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote: > Hi, > > Over the past few months, we've received quite a lot of feedback on the > consumer side features and design. Some of them are improvements to the > current consumer design and some are simply new feature/API requests. I > have attempted to write up the requirements that I've heard on this wiki - > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design> > This would involve some significant changes to the consumer APIs, so we > would like to collect feedback on the proposal from our community. Since > the list of changes is not small, we would like to understand if some > features are preferred over others, and more importantly, if some features > are not required at all. > > Since some part of this proposal is experimental and the consumer side > changes are non-trivial, we would like this initiative to not interfere > with the forthcoming replication release. However, it will be good to have > people from the community give this some thought and help out with the > JIRAs if interested. One way of managing this project could be creating a > separate branch from the kafka trunk and continue development on it. Once > it is ready and in good shape, we can think about cutting another release > (after 0.8) for the releasing the new consumer API. Do people have > preferences/concerns regarding creating a separate branch for this project > ? > > Please feel free to start a discussion on this JIRA - > https://issues.apache.org/jira/browse/KAFKA-364> > Thanks, > > Neha >
+
Jun Rao 2012-06-14, 16:08
|
|