Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - producer rewrite


Copy link to this message
-
Re: producer rewrite
Jay Kreps 2014-01-24, 21:37
So folks there are some comments on the RB, I take it from this discussion
people are cool with me just checking in what I have and addressing the
comments asynchronously? If no objection I will do that.

-Jay
On Thu, Jan 23, 2014 at 12:56 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> Cool, I've uploaded a patch and rb here:
> https://issues.apache.org/jira/browse/KAFKA-1227
>
> -Jay
>
>
> On Thu, Jan 23, 2014 at 12:00 PM, Joe Stein <[EMAIL PROTECTED]> wrote:
>
>> awesome! +1 for checking this in as is as you suggest
>>
>> /*******************************************
>>  Joe Stein
>>  Founder, Principal Consultant
>>  Big Data Open Source Security LLC
>>  http://www.stealth.ly
>>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> ********************************************/
>>
>>
>> On Thu, Jan 23, 2014 at 2:37 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>>
>> > This approach sounds reasonable to me. Since the new code will be not be
>> > used in the current kafka jar, we can still release 0.8.1 off trunk when
>> > it's ready.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> >
>> > On Thu, Jan 23, 2014 at 10:23 AM, Jay Kreps <[EMAIL PROTECTED]>
>> wrote:
>> >
>> > > Hey all,
>> > >
>> > > I have been working on a rewrite of the producer as described in the
>> wiki
>> > > below and discussed in a few previous threads:
>> > > https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite
>> > >
>> > > My code is still has some bugs and is a bit rough in parts, but it
>> > > functions in the basic cases. I did some basic performance tests over
>> > > localhost, and the new approach has paid off quite significantly--for
>> > small
>> > > (10 byte) messages a single thread on my laptop can send over 1m
>> > > messages/second, and with larger messages easily maxes out the server.
>> > >
>> > > The difference between "sync" and "async" largely producer
>> > disappears--all
>> > > requests immediately return a future response which can be used to get
>> > the
>> > > behavior of either sync or async usage and we batch whenever the
>> producer
>> > > is under load using a "group commit"-like approach. You can encourage
>> > > additional batching by incurring a small amount of latency (as
>> before).
>> > >
>> > > Let's talk about how to integrate this code.
>> > >
>> > > This is a from-scratch rewrite of the producer code. As such it is a
>> > pretty
>> > > major change. So far I have mostly been working on my own. I'd like to
>> > > start getting feedback before I get too far along--no point in my
>> > polishing
>> > > things that are going to be significantly revised in review, after
>> all.
>> > >
>> > > As such here is what I would propose:
>> > >
>> > > 1. I'll put up a preliminary patch. Since this code is a completely
>> > > standalone module it will not destabilize the existing server or
>> existing
>> > > producer (in fact there is no change to those). I will avoid including
>> > > build support in this patch until we get the gradle stuff worked out
>> so
>> > as
>> > > to not break that patch (hopefully that moves along). Let's take this
>> > patch
>> > > "as is" but with no expectation that the code is complete or that
>> checkin
>> > > implies everyone agrees with every design decision. I will follow-up
>> with
>> > > subsequent patches as we do reviews and discussions.
>> > >
>> > > 2. I'll send out a few higher-level topics for discussion threads.
>> Let's
>> > > get to consensus on these. I think micro-reviewing minor correctness
>> > issues
>> > > won't be productive until we make higher level decisions. The topics.
>> I'd
>> > > like to discuss include
>> > > a. The producer code:
>> > >      - The public API
>> > >      - The configurations: their names, and the general knobs we are
>> > >      - Client message serialization
>> > >      - The instrumentation to have
>> > >      - The blocking and batching behavior
>> > > b. The common code and few other cross-cutting policy things
>> > >      - The approach to protocol definition and request serialization