Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - Client improvement discussion


Copy link to this message
-
Re: Client improvement discussion
Jay Kreps 2013-08-02, 19:50
I believe there are some open source C++ producer implementations. At
linkedin we have a C++ implementation. We would like to open source this if
there is interest. We would like to eventually include a C++ consumer as
well.

-Jay
On Mon, Jul 29, 2013 at 6:03 AM, Sybrandy, Casey <
[EMAIL PROTECTED]> wrote:

> In the past there was some discussion about having a C client for non-JVM
> languages.  Is this still planned as well?  Being able to work with Kafka
> from other languages would be a great thing.  Where I work, we interact
> with Kafka via Java and Ruby (producer), so having an official C library
> that can be used from within Ruby would make it easier to have the same
> version of the client in Java and Ruby.
>
> -----Original Message-----
> From: Jay Kreps [mailto:[EMAIL PROTECTED]]
> Sent: Friday, July 26, 2013 3:00 PM
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Client improvement discussion
>
> I sent around a wiki a few weeks back proposing a set of client
> improvements that essentially amount to a rewrite of the producer and
> consumer java clients.
>
> https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite
>
> The below discussion assumes you have read this wiki.
>
> I started to do a little prototyping for the producer and wanted to share
> some of the ideas that came up to get early feedback.
>
> First, a few simple but perhaps controversial things to discuss.
>
> Rollout
> Phase 1: We add the new clients. No change on the server. Old clients
> still exist. The new clients will be entirely in a new package so there
> will be no possibility of name collision.
> Phase 2: We swap out all shared code on the server to use the new client
> stuff. At this point the old clients still exist but are essentially
> deprecated.
> Phase 3: We remove the old client code.
>
> Java
> I think we should do the clients in java. Making our users deal with
> scala's non-compatability issues and crazy stack traces causes people a lot
> of pain. Furthermore we end up having to wrap everything now to get a
> usable java api anyway for non-scala people. This does mean maintaining a
> substantial chunk of java code, which is maybe less fun than scala. But
> basically i think we should optimize for the end user and produce a
> standalone pure-java jar with no dependencies.
>
> Jars
> We definitely want to separate out the client jar. There is also a fair
> amount of code shared between both (exceptions, protocol definition, utils,
> and the message set implementation). Two approaches.
> Two jar approach: split kafka.jar into kafka-clients.jar and
> kafka-server.jar with the server depending on the clients. The advantage of
> this is that it is simple. The disadvantage is that things like utils and
> protocol definition will be in the client jar though technical they belong
> equally to the server.
> Many jar approach: split kafka.jar into kafka-common.jar,
> kafka-producer.jar, kafka-consumer.jar, kafka-admin.jar, and
> kafka-server.jar. The disadvantage of this is that the user needs two jars
> (common + something) which is for sure going to confuse people. I also
> think this will tend to spawn more jars over time.
>
> Background threads
> I am thinking of moving both serialization and compression out of the
> background send thread. I will explain a little about this idea below.
>
> Serialization
> I am not sure if we should handle serialization in the client at all.
> Basically I wonder if our own API wouldn't just be a lot simpler if we
> took a byte[] key and byte[] value and let people serialize stuff
> themselves.
> Injecting a class name for us to create the serializer is more roundabout
> and has a lot of problems if the serializer itself requires a lot of
> configuration or other objects to be instantiated.
>
> Partitioning
> The real question with serialization is whether the partitioning should
> happen on the java object or on the byte array key. The argument for doing
> it on the java object is that it is easier to do something like a range