Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Non-blocking Kafka stream iterators


Copy link to this message
-
Re: Non-blocking Kafka stream iterators
Hey Guys,

One other potentially large benefit is to decouple broker dependencies
from consumer/producer dependencies. This makes upgrading the
consumer/producer and managing jar conflicts a lot less of a hassle.
Putting the consumer and producer in their own packages might hopefully
alleviate some of this. I'm not sure how much the broker is pulling in
that the consumer/producer aren't using, but it might be worth a look, if
there are a lot of jars that only the broker is using.

Cheers,
Chris

On 1/22/13 12:57 PM, "Evan Chan" <[EMAIL PROTECTED]> wrote:

>Hi Jay,
>
>Actually, it's mostly the ability to easily cross-build;   also the ease
>of
>understanding the code (less code to grok) and implementing alternatives
>(I
>guess all of those falls under cleanliness).
>
>thanks,
>Evan
>
>
>On Tue, Jan 22, 2013 at 12:47 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
>> Hi Evan,
>>
>> Makes sense. Is your goal in separating the client shrinking the jar
>>size?
>> or just general cleanliness?
>>
>> -Jay
>>
>>
>> On Tue, Jan 22, 2013 at 10:53 AM, Evan Chan <[EMAIL PROTECTED]> wrote:
>>
>> > Jay,
>> >
>> > Comments inlined.
>> >
>> > On Tue, Jan 22, 2013 at 10:15 AM, Jay Kreps <[EMAIL PROTECTED]>
>>wrote:
>> >
>> > > Hey Evan,
>> > >
>> > > Great points, some comments:
>> > > - Not sure if I understand what you mean by separating consumer and
>> main
>> > > logic.
>> > >
>> >
>> > I just meant having a separate Scala/Java client jar, so it's more
>> > lightweight and easier to build independently.... kind of like the
>> > consumers for the other languages.
>> >
>> >
>> > > - Yes, cross-building, I think this is in progress now for kafka as
>>a
>> > whole
>> > > so it should be in either 0.8 or 0.8.1
>> > > - Yes, forgot to mention offset initialization, but that is
>>definitely
>> > > needed.
>> > >
>> > > For the hasNext functionality, even that is not very good since if
>>you
>> > have
>> > > two streams and want to take the next message from either you would
>> have
>> > to
>> > > busy wait calling hasNext on both in a loop.
>> > >
>> > > An alternative would be something like
>> > > val client = new ConsumerClient(topics, config)
>> > > client.select(timeout: Long): Iterator[MessageAndMetadata]
>> > >
>> > > This method would have no internal threading. It would
>>scatter-gather
>> > over
>> > > the topic/partitions assigned to this consumer (whether they are
>> > statically
>> > > or dynamically assigned would be specified in the config). The
>>select
>> > call
>> > > would internally just do an epoll/select on all the connections and
>> > return
>> > > the first message set it gets back or an empty list if it hits the
>> > timeout
>> > > and no one has responded.
>> > >
>> >
>> > Hm, I like that API actually.  It would definitely be more flexible.
>> >
>> >
>> > >
>> > > This api is less intuitive then the blocking iterator, but more
>> flexible
>> > > and enables a better, faster implementation. There would be no
>>threads
>> > > aside from the client's thread. It allows non-blocking or blocking
>> > > consumption. And it generalizes easily to consuming from many
>> > > topics/partitions simultaneously.
>> > >
>> > > We could implement an iterator like wrapper for this to ease the
>> > transition
>> > > that just used this api under the covers.
>> > >
>> > > Anyhow this is a ways out, and we haven't really had any proposals
>>or
>> > > discussions on it, but this is what I was thinking.
>> > >
>> > > -Jay
>> > >
>> > >
>> > >
>> > >
>> > > On Tue, Jan 22, 2013 at 9:37 AM, Evan Chan <[EMAIL PROTECTED]> wrote:
>> > >
>> > > > Jay,
>> > > >
>> > > > For the consumer:
>> > > > - Separation of the consumer logic from the main logic
>> > > > - Making it easier to build the consumer for different versions of
>> > Scala
>> > > > (say 2.10)
>> > > > - Make it easier to read from any offset you want, while being
>>able
>> to
>> > > keep
>> > > > partition management features
>> > > > - Better support for Akka and other non-blocking / event-based
>> > frameworks
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB