Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> producer rewrite


Copy link to this message
-
Re: producer rewrite
This approach sounds reasonable to me. Since the new code will be not be
used in the current kafka jar, we can still release 0.8.1 off trunk when
it's ready.

Thanks,

Jun
On Thu, Jan 23, 2014 at 10:23 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> Hey all,
>
> I have been working on a rewrite of the producer as described in the wiki
> below and discussed in a few previous threads:
> https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite
>
> My code is still has some bugs and is a bit rough in parts, but it
> functions in the basic cases. I did some basic performance tests over
> localhost, and the new approach has paid off quite significantly--for small
> (10 byte) messages a single thread on my laptop can send over 1m
> messages/second, and with larger messages easily maxes out the server.
>
> The difference between "sync" and "async" largely producer disappears--all
> requests immediately return a future response which can be used to get the
> behavior of either sync or async usage and we batch whenever the producer
> is under load using a "group commit"-like approach. You can encourage
> additional batching by incurring a small amount of latency (as before).
>
> Let's talk about how to integrate this code.
>
> This is a from-scratch rewrite of the producer code. As such it is a pretty
> major change. So far I have mostly been working on my own. I'd like to
> start getting feedback before I get too far along--no point in my polishing
> things that are going to be significantly revised in review, after all.
>
> As such here is what I would propose:
>
> 1. I'll put up a preliminary patch. Since this code is a completely
> standalone module it will not destabilize the existing server or existing
> producer (in fact there is no change to those). I will avoid including
> build support in this patch until we get the gradle stuff worked out so as
> to not break that patch (hopefully that moves along). Let's take this patch
> "as is" but with no expectation that the code is complete or that checkin
> implies everyone agrees with every design decision. I will follow-up with
> subsequent patches as we do reviews and discussions.
>
> 2. I'll send out a few higher-level topics for discussion threads. Let's
> get to consensus on these. I think micro-reviewing minor correctness issues
> won't be productive until we make higher level decisions. The topics. I'd
> like to discuss include
> a. The producer code:
>      - The public API
>      - The configurations: their names, and the general knobs we are
>      - Client message serialization
>      - The instrumentation to have
>      - The blocking and batching behavior
> b. The common code and few other cross-cutting policy things
>      - The approach to protocol definition and request serialization
>      - The config definition helper code
>      - The metrics package
>      - The project layout
>      - The java coding style and the use of java
>      - The approach to logging
>
> This is somewhat backwards, but I think it will be easier to handle changes
> that fall out of these discussions against an existing code base that is
> checked in otherwise each revision will be a brand new very large patch.
>
> If no objections I will toss up this code and kick off some of these
> discussions.
>
> -Jay
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB