Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> Analysis of producer performance


+
Piotr Kozikowski 2013-04-08, 23:43
+
Jun Rao 2013-04-09, 04:49
+
Guy Doulberg 2013-04-09, 06:34
+
Piotr Kozikowski 2013-04-09, 17:23
+
Otis Gospodnetic 2013-04-10, 19:05
+
Piotr Kozikowski 2013-04-10, 20:11
+
Yiu Wing TSANG 2013-04-11, 02:47
+
Jun Rao 2013-04-11, 05:18
+
Piotr Kozikowski 2013-04-12, 00:46
Copy link to this message
-
Re: Analysis of producer performance
Another way to handle this is to provision enough client and broker servers
so that the peak load can be handled without spooling.

Thanks,

Jun
On Thu, Apr 11, 2013 at 5:45 PM, Piotr Kozikowski <[EMAIL PROTECTED]>wrote:

> Jun,
>
> When talking about "catastrophic consequences" I was actually only
> referring to the producer side. in our use case (logging requests from
> webapp servers), a spike in traffic would force us to either tolerate a
> dramatic increase in the response time, or drop messages, both of which are
> really undesirable. Hence the need to absorb spikes with some system on top
> of Kafka, unless the spooling feature mentioned by Wing (
> https://issues.apache.org/jira/browse/KAFKA-156) is implemented. This is
> assuming there are a lot more producer machines than broker nodes, so each
> producer would absorb a small part of the extra load from the spike.
>
> Piotr
>
> On Wed, Apr 10, 2013 at 10:17 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Piotr,
> >
> > Actually, could you clarify what "catastrophic consequences" did you see
> on
> > the broker side? Do clients timeout due to longer serving time or
> something
> > else?
> >
> > Going forward, we plan to add per client quotas (KAFKA-656) to prevent
> the
> > brokers from being overwhelmed by a runaway client.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > >
> > > Is there anything one can do to "defend" from:
> > >
> > > "Trying to push more data than the brokers can handle for any sustained
> > > period of time has catastrophic consequences, regardless of what
> timeout
> > > settings are used. In our use case this means that we need to either
> > ensure
> > > we have spare capacity for spikes, or use something on top of Kafka to
> > > absorb spikes."
> > >
> > > ?
> > > Thanks,
> > > Otis
> > > ----
> > > Performance Monitoring for Solr / ElasticSearch / HBase -
> > > http://sematext.com/spm
> > >
> > >
> > >
> > >
> > >
> > > >________________________________
> > > > From: Piotr Kozikowski <[EMAIL PROTECTED]>
> > > >To: [EMAIL PROTECTED]
> > > >Sent: Tuesday, April 9, 2013 1:23 PM
> > > >Subject: Re: Analysis of producer performance
> > > >
> > > >Jun,
> > > >
> > > >Thank you for your comments. I'll reply point by point for clarity.
> > > >
> > > >1. We were aware of the migration tool but since we haven't used Kafka
> > for
> > > >production yet we just started using the 0.8 version directly.
> > > >
> > > >2. I hadn't seen those particular slides, very interesting. I'm not
> sure
> > > >we're testing the same thing though. In our case we vary the number of
> > > >physical machines, but each one has 10 threads accessing a pool of
> Kafka
> > > >producer objects and in theory a single machine is enough to saturate
> > the
> > > >brokers (which our test mostly confirms). Also, assuming that the
> slides
> > > >are based on the built-in producer performance tool, I know that we
> > > started
> > > >getting very different numbers once we switched to use "real" (actual
> > > >production log) messages. Compression may also be a factor in case it
> > > >wasn't configured the same way in those tests.
> > > >
> > > >3. In the latency section, there are two tests, one for average and
> > > another
> > > >for maximum latency. Each one has two graphs presenting the exact same
> > > data
> > > >but at different levels of zoom. The first one is to observe small
> > > >variations of latency when target throughput <= actual throughput. The
> > > >second is to observe the overall shape of the graph once latency
> starts
> > > >growing when target throughput > actual throughput. I hope that makes
> > > sense.
> > > >
> > > >4. That sounds great, looking forward to it.
> > > >
> > > >Piotr
> > > >
> > > >On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > > >
> > > >> Piotr,
> > > >>
> > > >> Thanks for sharing this. Very interesting and useful study. A few

 
+
Piotr Kozikowski 2013-04-12, 23:09
+
Jun Rao 2013-04-15, 01:06
+
Philip OToole 2013-04-12, 15:22
+
S Ahmed 2013-04-12, 15:28
+
Philip OToole 2013-04-12, 15:59
+
Philip OToole 2013-04-12, 17:04
+
Piotr Kozikowski 2013-04-15, 18:19
+
David Arthur 2013-04-23, 12:22
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB