Kafka, mail # dev - code layout for new clients - 2014-01-24, 21:25
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
code layout for new clients
Hey All,

Another topic worth discussing is how to layout code for the new producer
and consumer as well as the common code they will share. There are really
three questions:
1. Which top-level sub-modules/directories should we have (currently
everything is under core, but presumably we want to split that up)?
2. What jars should we produce and what will be the dependencies between
3. What should the package layout be within the modules?

Let's use this thread to discuss that and make a decision.

(2) is arguably the most relevant to the end-user who presumably doesn't
care how we layout our code, so let's start with that. One constraint we
have is that there is some code that must be shared between client and
server (or else we will just have tons of duplication). This is true for
utilities, data formats, etc. I think the server will likely end up
embedding the consumer for replication and likely the producer for other
uses such as offsets, so the server will necessarily depend on the client.
The client should not depend on the server or we would have a circular
dependency. Thus common code can't be kept with the server.

I think there are several possibilities:
a. 2 jar solution: Have a kafka-client.jar which contains the producer,
consumer, and any future admin client we might add. Have a kafka-server.jar
which contains the server and depends on the client jar.
b. 3 jar solution: Have a kafka-common.jar which contains common code and a
kafka-client.jar and kafka-server.jar. Client would depend on common and
server would depend on common and client. One knock on this approach is
that the common jar isn't really very useful on its own and it is perhaps
kind of irritating to have to have to jars for clients.
c. Multi jar solution: Have a kafka-common.jar, plus one for each client
(kafka-producer.jar, kafka-consumer.jar, kafka-admin.jar).

I would vote for (a) in the absence of any other input because it is the
simplest for the user who just needs a single client jar.

For (1) we could either have the code modules mimic the resulting jars or
not. I think there might be some value in separating the producer,
consumer, and common code to avoid crazy internal dependencies between
packages. This could be enforced either by having separate modules which
compile separately and then package into one jar or else by just keeping
everything together and using checkstyle to enforce this. Currently there
is only one module. The only weird thing about having it together is that
common utilities is under the "clients" module which is a little
unintuitive. I'm not sure if having modules not match the resulting jars
will cause build headaches.

Okay finally let's discuss the layout of packages in the existing code.

The most important aspect of this is that we separate public from internal
classes. This will make it easier to produce clean javadocs, and will help
us keep these public apis clean and fully documented. Currently the public
packages are
Other alternatives would be to attempt to annotate public classes or to
include some naming scheme like kafka.common.api and
kafka.clients.producer.api that would make it more clear which packages are

Guozhang had several other comments on packages in KAFKA-1227. Let's use
this thread to discuss these or any other suggestions and make a decision
on how to do this.


NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB