Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Non-blocking Kafka stream iterators


Copy link to this message
-
Re: Non-blocking Kafka stream iterators
Evan Chan 2013-01-22, 18:53
Jay,

Comments inlined.

On Tue, Jan 22, 2013 at 10:15 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> Hey Evan,
>
> Great points, some comments:
> - Not sure if I understand what you mean by separating consumer and main
> logic.
>

I just meant having a separate Scala/Java client jar, so it's more
lightweight and easier to build independently.... kind of like the
consumers for the other languages.
> - Yes, cross-building, I think this is in progress now for kafka as a whole
> so it should be in either 0.8 or 0.8.1
> - Yes, forgot to mention offset initialization, but that is definitely
> needed.
>
> For the hasNext functionality, even that is not very good since if you have
> two streams and want to take the next message from either you would have to
> busy wait calling hasNext on both in a loop.
>
> An alternative would be something like
> val client = new ConsumerClient(topics, config)
> client.select(timeout: Long): Iterator[MessageAndMetadata]
>
> This method would have no internal threading. It would scatter-gather over
> the topic/partitions assigned to this consumer (whether they are statically
> or dynamically assigned would be specified in the config). The select call
> would internally just do an epoll/select on all the connections and return
> the first message set it gets back or an empty list if it hits the timeout
> and no one has responded.
>

Hm, I like that API actually.  It would definitely be more flexible.
>
> This api is less intuitive then the blocking iterator, but more flexible
> and enables a better, faster implementation. There would be no threads
> aside from the client's thread. It allows non-blocking or blocking
> consumption. And it generalizes easily to consuming from many
> topics/partitions simultaneously.
>
> We could implement an iterator like wrapper for this to ease the transition
> that just used this api under the covers.
>
> Anyhow this is a ways out, and we haven't really had any proposals or
> discussions on it, but this is what I was thinking.
>
> -Jay
>
>
>
>
> On Tue, Jan 22, 2013 at 9:37 AM, Evan Chan <[EMAIL PROTECTED]> wrote:
>
> > Jay,
> >
> > For the consumer:
> > - Separation of the consumer logic from the main logic
> > - Making it easier to build the consumer for different versions of Scala
> > (say 2.10)
> > - Make it easier to read from any offset you want, while being able to
> keep
> > partition management features
> > - Better support for Akka and other non-blocking / event-based frameworks
> > (instead of a timeout, implement true hasNext functionality, for example)
> >
> > thanks,
> > Evan
> >
> >
> > On Mon, Jan 21, 2013 at 9:27 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> >
> > > It's worth mentioning that we are interested in exploring potential
> > > generalizations of the producer and consumer API, but as a practical
> > matter
> > > most of the committers are working on getting a stable 0.8 release out
> > the
> > > door. So an improved consumer and producer api would be a 0.9 feature.
> > >
> > > If you have a concrete thing you are trying to do now that is awkward
> it
> > > would be great to hear about the use case.
> > >
> > > Possible goals of improving the apis and client impls would include the
> > > following:
> > >
> > > Producer:
> > > 1. Include the offset in the information returned to the producer
> > > 2. Pipeline producer requests to improve throughput for synchronous
> > > production
> > >
> > > Consumer
> > > 1. Simplify api while supporting various advanced use cases like
> > > multi-stream consumption
> > > 2. Make partition assignment optional and server-side (this is
> currently
> > > the difference between the zk consumer and the simple consumer)
> > > 3. Make offset management optional
> > > 4. Remove threading from the consumer
> > > 5. Simplify consumer memory management
> > >
> > > -Jay
> > >
> > >
> > >
> > >
> > > On Mon, Jan 21, 2013 at 8:05 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > No, but you can implement it in your application.

*Evan Chan*
Senior Software Engineer |
[EMAIL PROTECTED] | (650) 996-4600
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>