Does Kafka still only support blocking stream iterstors? It would be great to pass a timeout or have a poll() operation for fetching items. Right now I'm always blocking in this call: for (m <- stream) ...
It's worth mentioning that we are interested in exploring potential generalizations of the producer and consumer API, but as a practical matter most of the committers are working on getting a stable 0.8 release out the door. So an improved consumer and producer api would be a 0.9 feature.
If you have a concrete thing you are trying to do now that is awkward it would be great to hear about the use case.
Possible goals of improving the apis and client impls would include the following:
Producer: 1. Include the offset in the information returned to the producer 2. Pipeline producer requests to improve throughput for synchronous production
Consumer 1. Simplify api while supporting various advanced use cases like multi-stream consumption 2. Make partition assignment optional and server-side (this is currently the difference between the zk consumer and the simple consumer) 3. Make offset management optional 4. Remove threading from the consumer 5. Simplify consumer memory management
-Jay On Mon, Jan 21, 2013 at 8:05 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
For the consumer: - Separation of the consumer logic from the main logic - Making it easier to build the consumer for different versions of Scala (say 2.10) - Make it easier to read from any offset you want, while being able to keep partition management features - Better support for Akka and other non-blocking / event-based frameworks (instead of a timeout, implement true hasNext functionality, for example)
Great points, some comments: - Not sure if I understand what you mean by separating consumer and main logic. - Yes, cross-building, I think this is in progress now for kafka as a whole so it should be in either 0.8 or 0.8.1 - Yes, forgot to mention offset initialization, but that is definitely needed.
For the hasNext functionality, even that is not very good since if you have two streams and want to take the next message from either you would have to busy wait calling hasNext on both in a loop.
An alternative would be something like val client = new ConsumerClient(topics, config) client.select(timeout: Long): Iterator[MessageAndMetadata]
This method would have no internal threading. It would scatter-gather over the topic/partitions assigned to this consumer (whether they are statically or dynamically assigned would be specified in the config). The select call would internally just do an epoll/select on all the connections and return the first message set it gets back or an empty list if it hits the timeout and no one has responded.
This api is less intuitive then the blocking iterator, but more flexible and enables a better, faster implementation. There would be no threads aside from the client's thread. It allows non-blocking or blocking consumption. And it generalizes easily to consuming from many topics/partitions simultaneously.
We could implement an iterator like wrapper for this to ease the transition that just used this api under the covers.
Anyhow this is a ways out, and we haven't really had any proposals or discussions on it, but this is what I was thinking.
-Jay On Tue, Jan 22, 2013 at 9:37 AM, Evan Chan <[EMAIL PROTECTED]> wrote:
On Tue, Jan 22, 2013 at 10:15 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: I just meant having a separate Scala/Java client jar, so it's more lightweight and easier to build independently.... kind of like the consumers for the other languages.
Hm, I like that API actually. It would definitely be more flexible.
One other potentially large benefit is to decouple broker dependencies from consumer/producer dependencies. This makes upgrading the consumer/producer and managing jar conflicts a lot less of a hassle. Putting the consumer and producer in their own packages might hopefully alleviate some of this. I'm not sure how much the broker is pulling in that the consumer/producer aren't using, but it might be worth a look, if there are a lot of jars that only the broker is using.
On 1/22/13 12:57 PM, "Evan Chan" <[EMAIL PROTECTED]> wrote:
Putting the consumer and producer in their own packages might hopefully
I like this idea. Moving forward, the biggest dependencies that the broker will have and the producer/consumer clients won't are the zookeeper/zkclient jars. It might be worth looking into this. Please can you file a JIRA ?
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext