Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - Re: Abou Kafka 0.8 producer throughput test


Copy link to this message
-
Re: Abou Kafka 0.8 producer throughput test
Jay Kreps 2013-01-23, 04:01
This is a good question.

As mentioned we have some experience running this with no ack and there are
a lot of downsides. We considered making the ack optional, but this would
complicate the producer api since we could give back the offset only in the
case where there is an ack.

Thinking about it more we realized there is no real performance hit, just
latency, and you only pay for the latency if you want to wait for the
response. This resulted in the current tentative plan which is to make all
requests async, always return "future response" so you only block if you
want to get the result. This is the best possible end-state since we can
give the rich general api with the same performance as without the ack.

However this requires a fairly large change in the client which we haven't
done yet.

So in 0.8 synchronous producer performance will decrease. Asynchronous
production should probably not be too much worse because the async
production and batching masks and amortizes the latency. It is too early to
say how much worse it will be as there are still a few perf issues to
resolve.

It would be reasonable if people were a little annoyed by this since we are
effectively making the software worse on some dimensions before we make it
better. Our reasoning was that batching up even more changes into a single
release was just too dangerous. People who care about replication will
(hopefully) care enough about this that taking a hit on sync producer
performance will be okay, and people who don't care about replication can
just skip a version since that is the major feature in 0.8.

-Jay
On Tue, Jan 22, 2013 at 7:10 PM, S Ahmed <[EMAIL PROTECTED]> wrote:

> Neha,
>
> I see, so that is a fairly substantial change, ofcourse it has its
> advantage of guaranteeing a higher degree of durability but as a
> significant cost (round trip that the consumer has to wait for).  I know
> someone mentioned creating a asych. consumer with a future.
>
> Do you have a 'gut' feeling performance will be the same as in .7 or x%
> slower?  (or you have no idea as of yet as you guys are still going to work
> on perf)
>
>
> On Fri, Jan 18, 2013 at 8:42 PM, Neha Narkhede <[EMAIL PROTECTED]
> >wrote:
>
> > >> producer.num.acks=0
> >
> > There is still a difference between the 0.7 and 0.8 Kafka behavior in the
> > sense that in 0.7, the producer fired away requests at the broker without
> > waiting for an ack. In 0.8, even with num.acks=0, the producer writes are
> > going to be synchronous and it won't be able to send the next request
> until
> > the ack for the previous one comes back.
> >
> > Thanks,
> > Neha
> >
> >
> > On Fri, Jan 18, 2013 at 12:24 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
> >
> > > I see ok, so if you wanted to compare .7 with .8 on the same footing,
> > then
> > > you would set it to 0 right? (since 0.7 is fire and forget)
> > >
> > > producer.num.acks=0
> > >
> > >
> > > On Thu, Jan 17, 2013 at 11:45 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > >
> > > > I means wait for the data reaches all replicas (that are in sync).
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, Jan 17, 2013 at 6:42 PM, S Ahmed <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > producer.num.acks=-1 means what sorry? is it that all replica's are
> > > > written
> > > > > too?
> > > > >
> > > > >
> > > > > On Thu, Jan 17, 2013 at 12:09 PM, Neha Narkhede <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > Looks like Jun's email didn't format the output properly. I've
> > > > published
> > > > > > some preliminary producer throughput performance numbers on our
> > > > > performance
> > > > > > wiki -
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing#Performancetesting-Producerthroughput
> > > > > >
> > > > > > These tests measure producer throughput in the worst case
> scenario
> > > > > > (producer.num.acks=-1) i.e. max durability setting. The baseline