|
Neha Narkhede
2012-06-11, 23:52
Jay Kreps
2012-06-12, 16:59
Jun Rao
2012-06-14, 16:08
Evan Chan
2012-06-14, 20:39
Marcos Juarez
2012-06-14, 21:45
Neha Narkhede
2012-06-14, 21:53
Sybrandy, Casey
2012-06-18, 18:40
Chris Burroughs
2012-06-19, 02:00
Neha Narkhede
2012-06-19, 17:09
Dave Barr
2012-06-20, 05:49
Ross Black
2012-06-20, 13:30
|
-
Consumer re-design proposalNeha Narkhede 2012-06-11, 23:52
Hi,
Over the past few months, we've received quite a lot of feedback on the consumer side features and design. Some of them are improvements to the current consumer design and some are simply new feature/API requests. I have attempted to write up the requirements that I've heard on this wiki - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design This would involve some significant changes to the consumer APIs, so we would like to collect feedback on the proposal from our community. Since the list of changes is not small, we would like to understand if some features are preferred over others, and more importantly, if some features are not required at all. Since some part of this proposal is experimental and the consumer side changes are non-trivial, we would like this initiative to not interfere with the forthcoming replication release. However, it will be good to have people from the community give this some thought and help out with the JIRAs if interested. One way of managing this project could be creating a separate branch from the kafka trunk and continue development on it. Once it is ready and in good shape, we can think about cutting another release (after 0.8) for the releasing the new consumer API. Do people have preferences/concerns regarding creating a separate branch for this project ? Please feel free to start a discussion on this JIRA - https://issues.apache.org/jira/browse/KAFKA-364 Thanks, Neha
-
Re: Consumer re-design proposalJay Kreps 2012-06-12, 16:59
This is a great summary Neha. It would be good to get people's feedback on
this since we don't want to keep breaking api and protocol compatibility here, so the hope is to really get it right this time now that we have really seen all the use cases and live with the output for a while. I think the consumer design is a pretty hard protocol and API design problem, so its fun to think about. If I were to summarize Neha's requirements list, I think there are three high-level goals: 1. Simplify the consumer protocol to enable ease of development of consumer clients in other languages 2. Try to replace the "simple consumer" and "high level consumer" with a single, general interface that has all the advantages of both. 3. Support a bunch of use cases that either we didn't think of, or that weren't possible in the partitioning model of the pre-0.8 code base. -Jay On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote: > Hi, > > Over the past few months, we've received quite a lot of feedback on the > consumer side features and design. Some of them are improvements to the > current consumer design and some are simply new feature/API requests. I > have attempted to write up the requirements that I've heard on this wiki - > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design > > This would involve some significant changes to the consumer APIs, so we > would like to collect feedback on the proposal from our community. Since > the list of changes is not small, we would like to understand if some > features are preferred over others, and more importantly, if some features > are not required at all. > > Since some part of this proposal is experimental and the consumer side > changes are non-trivial, we would like this initiative to not interfere > with the forthcoming replication release. However, it will be good to have > people from the community give this some thought and help out with the > JIRAs if interested. One way of managing this project could be creating a > separate branch from the kafka trunk and continue development on it. Once > it is ready and in good shape, we can think about cutting another release > (after 0.8) for the releasing the new consumer API. Do people have > preferences/concerns regarding creating a separate branch for this project > ? > > Please feel free to start a discussion on this JIRA - > https://issues.apache.org/jira/browse/KAFKA-364 > > Thanks, > > Neha >
-
Re: Consumer re-design proposalJun Rao 2012-06-14, 16:08
If nobody objects, we can create a separate consumer redesign branch. This
way, everyone can see the changes and progress. Thanks, Jun On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote: > Hi, > > Over the past few months, we've received quite a lot of feedback on the > consumer side features and design. Some of them are improvements to the > current consumer design and some are simply new feature/API requests. I > have attempted to write up the requirements that I've heard on this wiki - > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design > > This would involve some significant changes to the consumer APIs, so we > would like to collect feedback on the proposal from our community. Since > the list of changes is not small, we would like to understand if some > features are preferred over others, and more importantly, if some features > are not required at all. > > Since some part of this proposal is experimental and the consumer side > changes are non-trivial, we would like this initiative to not interfere > with the forthcoming replication release. However, it will be good to have > people from the community give this some thought and help out with the > JIRAs if interested. One way of managing this project could be creating a > separate branch from the kafka trunk and continue development on it. Once > it is ready and in good shape, we can think about cutting another release > (after 0.8) for the releasing the new consumer API. Do people have > preferences/concerns regarding creating a separate branch for this project > ? > > Please feel free to start a discussion on this JIRA - > https://issues.apache.org/jira/browse/KAFKA-364 > > Thanks, > > Neha >
-
Re: Consumer re-design proposalEvan Chan 2012-06-14, 20:39
I would like to throw in a couple use cases:
- Allow the new consumer to reset its offset to either the current largest or smallest. This would be a great way to restart a process that has fallen behind. The only way I know how to do this today, with the high-level consumer, is to delete the ZK nodes manually and restart the consumer. - Allow the consumer to reset its offset to some arbitrary value, and then write that offset into ZK. Kind of like the first case, but would make rewinding/replays much easier. Modularity (the ability to layer the ZK infrastructure on top of the simple interface) would be great. thanks, Evan On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > This is a great summary Neha. It would be good to get people's feedback on > this since we don't want to keep breaking api and > protocol compatibility here, so the hope is to really get it right this > time now that we have really seen all the use cases and live with the > output for a while. I think the consumer design is a pretty hard protocol > and API design problem, so its fun to think about. > > If I were to summarize Neha's requirements list, I think there are three > high-level goals: > > 1. Simplify the consumer protocol to enable ease of development of > consumer clients in other languages > 2. Try to replace the "simple consumer" and "high level consumer" with a > single, general interface that has all the advantages of both. > 3. Support a bunch of use cases that either we didn't think of, or that > weren't possible in the partitioning model of the pre-0.8 code base. > > -Jay > > > On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED] > >wrote: > > > Hi, > > > > Over the past few months, we've received quite a lot of feedback on the > > consumer side features and design. Some of them are improvements to the > > current consumer design and some are simply new feature/API requests. I > > have attempted to write up the requirements that I've heard on this wiki > - > > > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design > > > > This would involve some significant changes to the consumer APIs, so we > > would like to collect feedback on the proposal from our community. Since > > the list of changes is not small, we would like to understand if some > > features are preferred over others, and more importantly, if some > features > > are not required at all. > > > > Since some part of this proposal is experimental and the consumer side > > changes are non-trivial, we would like this initiative to not interfere > > with the forthcoming replication release. However, it will be good to > have > > people from the community give this some thought and help out with the > > JIRAs if interested. One way of managing this project could be creating a > > separate branch from the kafka trunk and continue development on it. Once > > it is ready and in good shape, we can think about cutting another release > > (after 0.8) for the releasing the new consumer API. Do people have > > preferences/concerns regarding creating a separate branch for this > project > > ? > > > > Please feel free to start a discussion on this JIRA - > > https://issues.apache.org/jira/browse/KAFKA-364 > > > > Thanks, > > > > Neha > > > -- -- *Evan Chan* Senior Software Engineer | [EMAIL PROTECTED] | (650) 996-4600 www.ooyala.com | blog <http://www.ooyala.com/blog> | @ooyala<http://www.twitter.com/ooyala>
-
Re: Consumer re-design proposalMarcos Juarez 2012-06-14, 21:45
Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary value, and then write that offset into ZK".
We're currently running into a scenario where we would like to have 100% reliability, and we're losing a few messages when a connection is broken, but there were still a few messages in the OS TCP buffers. So, we're planning on shifting the ZK offset by a few seconds "back in time" if we detect a broker has gone down, to make sure all the messages will be actually delivered to the end consumer when that broker comes back up, even if there's a small amount of overlapping messages. Thanks, Marcos On Jun 14, 2012, at 2:39 PM, Evan Chan wrote: > I would like to throw in a couple use cases: > > > - Allow the new consumer to reset its offset to either the current > largest or smallest. This would be a great way to restart a process that > has fallen behind. The only way I know how to do this today, with the > high-level consumer, is to delete the ZK nodes manually and restart the > consumer. > - Allow the consumer to reset its offset to some arbitrary value, and > then write that offset into ZK. Kind of like the first case, but would > make rewinding/replays much easier. > > Modularity (the ability to layer the ZK infrastructure on top of the simple > interface) would be great. > > thanks, > Evan > > > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > >> This is a great summary Neha. It would be good to get people's feedback on >> this since we don't want to keep breaking api and >> protocol compatibility here, so the hope is to really get it right this >> time now that we have really seen all the use cases and live with the >> output for a while. I think the consumer design is a pretty hard protocol >> and API design problem, so its fun to think about. >> >> If I were to summarize Neha's requirements list, I think there are three >> high-level goals: >> >> 1. Simplify the consumer protocol to enable ease of development of >> consumer clients in other languages >> 2. Try to replace the "simple consumer" and "high level consumer" with a >> single, general interface that has all the advantages of both. >> 3. Support a bunch of use cases that either we didn't think of, or that >> weren't possible in the partitioning model of the pre-0.8 code base. >> >> -Jay >> >> >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED] >>> wrote: >> >>> Hi, >>> >>> Over the past few months, we've received quite a lot of feedback on the >>> consumer side features and design. Some of them are improvements to the >>> current consumer design and some are simply new feature/API requests. I >>> have attempted to write up the requirements that I've heard on this wiki >> - >>> >> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design >>> >>> This would involve some significant changes to the consumer APIs, so we >>> would like to collect feedback on the proposal from our community. Since >>> the list of changes is not small, we would like to understand if some >>> features are preferred over others, and more importantly, if some >> features >>> are not required at all. >>> >>> Since some part of this proposal is experimental and the consumer side >>> changes are non-trivial, we would like this initiative to not interfere >>> with the forthcoming replication release. However, it will be good to >> have >>> people from the community give this some thought and help out with the >>> JIRAs if interested. One way of managing this project could be creating a >>> separate branch from the kafka trunk and continue development on it. Once >>> it is ready and in good shape, we can think about cutting another release >>> (after 0.8) for the releasing the new consumer API. Do people have >>> preferences/concerns regarding creating a separate branch for this >> project >>> ? >>> >>> Please feel free to start a discussion on this JIRA - >>> https
-
Re: Consumer re-design proposalNeha Narkhede 2012-06-14, 21:53
Thanks for the feedback ! I moved it to
https://issues.apache.org/jira/browse/KAFKA-364, so that we can keep track of these. -Neha On Thu, Jun 14, 2012 at 2:45 PM, Marcos Juarez <[EMAIL PROTECTED]> wrote: > Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary > value, and then write that offset into ZK". > > We're currently running into a scenario where we would like to have 100% > reliability, and we're losing a few messages when a connection is broken, > but there were still a few messages in the OS TCP buffers. So, we're > planning on shifting the ZK offset by a few seconds "back in time" if we > detect a broker has gone down, to make sure all the messages will be > actually delivered to the end consumer when that broker comes back up, even > if there's a small amount of overlapping messages. > > Thanks, > > Marcos > > > On Jun 14, 2012, at 2:39 PM, Evan Chan wrote: > > > I would like to throw in a couple use cases: > > > > > > - Allow the new consumer to reset its offset to either the current > > largest or smallest. This would be a great way to restart a process > that > > has fallen behind. The only way I know how to do this today, with the > > high-level consumer, is to delete the ZK nodes manually and restart the > > consumer. > > - Allow the consumer to reset its offset to some arbitrary value, and > > then write that offset into ZK. Kind of like the first case, but > would > > make rewinding/replays much easier. > > > > Modularity (the ability to layer the ZK infrastructure on top of the > simple > > interface) would be great. > > > > thanks, > > Evan > > > > > > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > > >> This is a great summary Neha. It would be good to get people's feedback > on > >> this since we don't want to keep breaking api and > >> protocol compatibility here, so the hope is to really get it right this > >> time now that we have really seen all the use cases and live with the > >> output for a while. I think the consumer design is a pretty hard > protocol > >> and API design problem, so its fun to think about. > >> > >> If I were to summarize Neha's requirements list, I think there are three > >> high-level goals: > >> > >> 1. Simplify the consumer protocol to enable ease of development of > >> consumer clients in other languages > >> 2. Try to replace the "simple consumer" and "high level consumer" with > a > >> single, general interface that has all the advantages of both. > >> 3. Support a bunch of use cases that either we didn't think of, or that > >> weren't possible in the partitioning model of the pre-0.8 code base. > >> > >> -Jay > >> > >> > >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED] > >>> wrote: > >> > >>> Hi, > >>> > >>> Over the past few months, we've received quite a lot of feedback on the > >>> consumer side features and design. Some of them are improvements to the > >>> current consumer design and some are simply new feature/API requests. I > >>> have attempted to write up the requirements that I've heard on this > wiki > >> - > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design > >>> > >>> This would involve some significant changes to the consumer APIs, so we > >>> would like to collect feedback on the proposal from our community. > Since > >>> the list of changes is not small, we would like to understand if some > >>> features are preferred over others, and more importantly, if some > >> features > >>> are not required at all. > >>> > >>> Since some part of this proposal is experimental and the consumer side > >>> changes are non-trivial, we would like this initiative to not interfere > >>> with the forthcoming replication release. However, it will be good to > >> have > >>> people from the community give this some thought and help out with the > >>> JIRAs if interested. One way of managing this project could be > creating a > >>> separate branch from the kafka trunk and continue development on it.
-
RE: Consumer re-design proposalSybrandy, Casey 2012-06-18, 18:40
Would porting the consumer/producer code to C be a good idea? I say this because at least with most languages I know of, leveraging a C library is pretty easy. This way, you would have to maintain only the C library and others can make/maintain wrappers for their languages. Having to port to other languages is going to cause you to have a significant amount of maintenance if you change the protocol in the future.
________________________________________ From: Neha Narkhede [[EMAIL PROTECTED]] Sent: Thursday, June 14, 2012 5:53 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Consumer re-design proposal Thanks for the feedback ! I moved it to https://issues.apache.org/jira/browse/KAFKA-364, so that we can keep track of these. -Neha On Thu, Jun 14, 2012 at 2:45 PM, Marcos Juarez <[EMAIL PROTECTED]> wrote: > Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary > value, and then write that offset into ZK". > > We're currently running into a scenario where we would like to have 100% > reliability, and we're losing a few messages when a connection is broken, > but there were still a few messages in the OS TCP buffers. So, we're > planning on shifting the ZK offset by a few seconds "back in time" if we > detect a broker has gone down, to make sure all the messages will be > actually delivered to the end consumer when that broker comes back up, even > if there's a small amount of overlapping messages. > > Thanks, > > Marcos > > > On Jun 14, 2012, at 2:39 PM, Evan Chan wrote: > > > I would like to throw in a couple use cases: > > > > > > - Allow the new consumer to reset its offset to either the current > > largest or smallest. This would be a great way to restart a process > that > > has fallen behind. The only way I know how to do this today, with the > > high-level consumer, is to delete the ZK nodes manually and restart the > > consumer. > > - Allow the consumer to reset its offset to some arbitrary value, and > > then write that offset into ZK. Kind of like the first case, but > would > > make rewinding/replays much easier. > > > > Modularity (the ability to layer the ZK infrastructure on top of the > simple > > interface) would be great. > > > > thanks, > > Evan > > > > > > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > > >> This is a great summary Neha. It would be good to get people's feedback > on > >> this since we don't want to keep breaking api and > >> protocol compatibility here, so the hope is to really get it right this > >> time now that we have really seen all the use cases and live with the > >> output for a while. I think the consumer design is a pretty hard > protocol > >> and API design problem, so its fun to think about. > >> > >> If I were to summarize Neha's requirements list, I think there are three > >> high-level goals: > >> > >> 1. Simplify the consumer protocol to enable ease of development of > >> consumer clients in other languages > >> 2. Try to replace the "simple consumer" and "high level consumer" with > a > >> single, general interface that has all the advantages of both. > >> 3. Support a bunch of use cases that either we didn't think of, or that > >> weren't possible in the partitioning model of the pre-0.8 code base. > >> > >> -Jay > >> > >> > >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <[EMAIL PROTECTED] > >>> wrote: > >> > >>> Hi, > >>> > >>> Over the past few months, we've received quite a lot of feedback on the > >>> consumer side features and design. Some of them are improvements to the > >>> current consumer design and some are simply new feature/API requests. I > >>> have attempted to write up the requirements that I've heard on this > wiki > >> - > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design > >>> > >>> This would involve some significant changes to the consumer APIs, so we > >>> would like to collect feedback on the proposal from our community.
-
Re: Consumer re-design proposalChris Burroughs 2012-06-19, 02:00
On 06/12/2012 12:59 PM, Jay Kreps wrote:
> 2. Try to replace the "simple consumer" and "high level consumer" with a > single, general interface that has all the advantages of both. I've read through the wiki pages but think I'm missing the forrest for the trees. For a consumer that wants "Manual partition assignment" and "Manual offset management", what does the proposed offer over the existing SimpleConsumer?
-
Re: Consumer re-design proposalNeha Narkhede 2012-06-19, 17:09
Chris,
One of the goals of thinning the Kafka consumer client is removing the zookeeper client from the consumer. Without this, Kafka consumer client would depend on the stability of a zookeeper client. >> For a consumer that wants "Manual partition assignment" and "Manual offset management", what does the proposed offer over the existing SimpleConsumer? Right now, we have 2 Kafka consumer clients, some functionality is possible in one but not the other. Some users have requested features that would require some combination of the functionalities offered by the two consumer clients. We think it might be a good idea to collect feedback and try to design a single consumer client API that satisfies these requirements. But it's unclear if this is quite the right solution. We will be writing up some concrete API/protocol proposal soon. I will send it around for more detailed feedback. Thanks, Neha On Mon, Jun 18, 2012 at 7:00 PM, Chris Burroughs <[EMAIL PROTECTED]> wrote: > On 06/12/2012 12:59 PM, Jay Kreps wrote: >> 2. Try to replace the "simple consumer" and "high level consumer" with a >> single, general interface that has all the advantages of both. > > I've read through the wiki pages but think I'm missing the forrest for > the trees. > > For a consumer that wants "Manual partition assignment" and "Manual > offset management", what does the proposed offer over the existing > SimpleConsumer?
-
Re: Consumer re-design proposalDave Barr 2012-06-20, 05:49
On Tue, Jun 19, 2012 at 10:09 AM, Neha Narkhede <[EMAIL PROTECTED]> wrote:
> One of the goals of thinning the Kafka consumer client is removing the > zookeeper client from the consumer. Without this, Kafka consumer > client would depend on the stability of a zookeeper client. If there's a stability issue with the zookeeper client, then that should be addressed. ZK is a fine tool for service discovery and coordination. It seems like any new system that forced me, as a consumer, to use yet another system to bootstrap and discover where my brokers are for a topic would be a step backward. I'm curious, why, specifically, is removing ZK a design goal (especially when it's such a core component of the broker)? I think of other projects, like HBase, which seem to have no issue with using ZK in their client. --Dave
-
Re: Consumer re-design proposalRoss Black 2012-06-20, 13:30
I added a couple of comments to the issue
https://issues.apache.org/jira/browse/KAFKA-364 (I was not certain whether you wanted comments on the mailing list, the wiki page, or the issue?) Thanks, Ross |