Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - 0.8.0 HEAD 3/4/2013 performance jump?


+
Chris Curtin 2013-03-04, 20:01
+
Neha Narkhede 2013-03-04, 20:07
+
Jun Rao 2013-03-05, 06:01
+
Chris Curtin 2013-03-05, 13:30
+
Joe Stein 2013-03-05, 14:37
+
Chris Curtin 2013-03-05, 15:56
Copy link to this message
-
Re: 0.8.0 HEAD 3/4/2013 performance jump?
Jun Rao 2013-03-05, 16:14
Chris, Joe,

Yes, the default ack is currently 0. Let me explain the ack mode a bit more
so that we are on the same page (details are covered in my ApachCon
presentation
http://www.slideshare.net/junrao/kafka-replication-apachecon2013) . There
are only 3 ack modes that make sense.

ack=0: producer waits until the message is in the producer's socket buffer
ack=1: producer waits until the message is received by the leader
ack=-1: producer waits until the message is committed

The tradeoffs are:

ack=0: lowest latency; some data loss during broker failure
ack=1: lower latency; a few data loss during broker failure
ack=-1: low latency; no data loss during broker failure

All cases work with replication factor 1, which is the default setting out
of box. With ack=1/-1, the producer may see some error when the leader
hasn't been being elected. However, the number of errors should be small
since typically leaders are elected very quickly.

The argument for making the default ack 0 is that (1) this is the same
behavior you get in 0.7 and (2) the producer runs fastest in this mode.

The argument for making the default ack 1 or -1 is that they gave you
better reliability.

I am not sure what's the best thing to do that here since correct setting
really depends on the application. What do people feel?

Thanks,

Jun
On Tue, Mar 5, 2013 at 6:36 AM, Joe Stein <[EMAIL PROTECTED]> wrote:

> Hi Chris, setting the ack default to 1 would mean folks would have to have
> a replica setup and configured otherwise starting a server from scratch
> from download would mean an error message to the user.   I hear your risk
> of not replicating though perhaps such a use case would be solved through
> auto discovery or some other feature/contribution for 0.9.
>
> I would be -1 on changing the default right now because new folks coming in
> on a build either as new or migrations simply leaving because they got an
> error or even running by just git clone ./sbt package and running (less
> steps in 0.8).  There are already expectations on 0.8 we should try to keep
> things settling too.
>
> Lastly, folks when they run and go live often will have a chef, cfengine,
> puppet, etc script for configuration
>
> Perhaps through some more operation documentation, comments and general
> communications to the community we can reduce risk.
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> */
>
> On Tue, Mar 5, 2013 at 8:30 AM, Chris Curtin <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Jun,
> >
> > I wasn't explicitly setting the ack anywhere.
> >
> > Am I reading the code correctly that in SyncProducerConfig.scala the
> > DefaultRequiredAcks is 0? Thus not waiting on the leader?
> >
> > Setting:  props.put("request.required.acks", "1"); causes the writes to
> go
> > back to the performance I was seeing before yesterday.
> >
> > Are you guys open to changing the default to be 1? The MongoDB
> Java-driver
> > guys made a similar default change at the end of last year because many
> > people didn't understand the risk that the default value of no-ack was
> > putting them in until they had a node failure. So they default to 'safe'
> > and let you decide what your risk level is vs. assuming you can lose
> data.
> >
> > Thanks,
> >
> > Chris
> >
> >
> >
> > On Tue, Mar 5, 2013 at 1:00 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >
> > > Chris,
> > >
> > > On the producer side, are you using ack=0? Earlier, ack=0 is the same
> as
> > > ack=1, which means that the producer has to wait for the message to be
> > > received by the leader. More recently, we did the actual implementation
> > of
> > > ack=0, which means the producer doesn't wait for the message to reach
> the
> > > leader and therefore it is much faster.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Mar 4, 2013 at 12:01 PM, Chris Curtin <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm definitely not complaining, but after upgrading to HEAD today my

 
+
Colin Blower 2013-03-05, 16:19
+
Chris Curtin 2013-03-05, 16:30
+
Neha Narkhede 2013-03-05, 16:46