Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Re: Client improvement discussion

Copy link to this message
Re: Client improvement discussion
+1 to making the API use bytes and push serialization into the client. This
is effectively what I am doing currently anyway. I implemented a generic
Encoder<ByteString> which just passes the bytes through.

I also like the idea of the client being written in pure Java. Interacting
with Scala code from Java isn't nearly as nice as the other way around.

Just my 2 cents.


On Fri, Jul 26, 2013 at 2:46 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Jay,
> This seems like a great direction.  Simplifying the consumer client would
> be a big win, and +1 for more native java client integration.
> On the last point, regarding memory usage for buffering per partition.  I
> would think it could be possible to devise a dynamic queuing system, to
> allow higher volume partitions to have larger effective buffers than
> smaller, low-volume partitions.  Thus, if you reserve a fixed
> total.buffer.memory, you could allocate units of buffer space which could
> then be composed to make larger buffers (perhaps not necessarily
> contiguous).  The long-tail of low-volume partitions could also be moved to
> some sort of auxiliary, non-collated buffer space, as they are less likely
> to benefit from contiguous buffering anyway.
> Fun stuff.
> Jason
> Jason
> On Fri, Jul 26, 2013 at 3:00 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> > I sent around a wiki a few weeks back proposing a set of client
> > improvements that essentially amount to a rewrite of the producer and
> > consumer java clients.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite
> >
> > The below discussion assumes you have read this wiki.
> >
> > I started to do a little prototyping for the producer and wanted to share
> > some of the ideas that came up to get early feedback.
> >
> > First, a few simple but perhaps controversial things to discuss.
> >
> > Rollout
> > Phase 1: We add the new clients. No change on the server. Old clients
> still
> > exist. The new clients will be entirely in a new package so there will be
> > no possibility of name collision.
> > Phase 2: We swap out all shared code on the server to use the new client
> > stuff. At this point the old clients still exist but are essentially
> > deprecated.
> > Phase 3: We remove the old client code.
> >
> > Java
> > I think we should do the clients in java. Making our users deal with
> > scala's non-compatability issues and crazy stack traces causes people a
> lot
> > of pain. Furthermore we end up having to wrap everything now to get a
> > usable java api anyway for non-scala people. This does mean maintaining a
> > substantial chunk of java code, which is maybe less fun than scala. But
> > basically i think we should optimize for the end user and produce a
> > standalone pure-java jar with no dependencies.
> >
> > Jars
> > We definitely want to separate out the client jar. There is also a fair
> > amount of code shared between both (exceptions, protocol definition,
> utils,
> > and the message set implementation). Two approaches.
> > Two jar approach: split kafka.jar into kafka-clients.jar and
> > kafka-server.jar with the server depending on the clients. The advantage
> of
> > this is that it is simple. The disadvantage is that things like utils and
> > protocol definition will be in the client jar though technical they
> belong
> > equally to the server.
> > Many jar approach: split kafka.jar into kafka-common.jar,
> > kafka-producer.jar, kafka-consumer.jar, kafka-admin.jar, and
> > kafka-server.jar. The disadvantage of this is that the user needs two
> jars
> > (common + something) which is for sure going to confuse people. I also
> > think this will tend to spawn more jars over time.
> >
> > Background threads
> > I am thinking of moving both serialization and compression out of the
> > background send thread. I will explain a little about this idea below.
> >
> > Serialization
> > I am not sure if we should handle serialization in the client at all.
> > Basically I wonder if our own API wouldn't just be a lot simpler if we

Jay Kreps 2013-07-29, 04:58