Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Analysis of producer performance

Copy link to this message
Re: Analysis of producer performance -- and Producer-Kafka reliability
This is just my opinion of course (who else's could it be? :-)) but I think
from an engineering point of view, one must spend one's time making the
Producer-Kafka connection solid, if it is mission-critical.

Kafka is all about getting messages to disk, and assuming your disks are
solid (and 0.8 has replication) those messages are safe. To then try to
build a system to cope with the Kafka brokers being unavailable seems like
you're setting yourself for infinite regress. And to write code in the
Producer to spool to disk seems even more pointless. If you're that
worried, why not run a dedicated Kafka broker on the same node as the
Producer, and connect over localhost? To turn around and write code to
spool to disk, because the primary system that *spools to disk* is down
seems to be missing the point.

That said, even by going over local-host, I guess the network connection
could go down. In that case, Producers should buffer in RAM, and start
sending some major alerts to the Operations team. But this should almost
*never happen*. If it is happening regularly *something is fundamentally
wrong with your system design*. Those Producers should also refuse any more
incoming traffic and await intervention. Even bringing up "netcat -l" and
letting it suck in the data and write it to disk would work then.
Alternatives include having Producers connect to a load-balancer with
multiple Kafka brokers behind it, which helps you deal with any one Kafka
broker failing. Or just have your Producers connect directly to multiple
Kafka brokers, and switch over as needed if any one broker goes down.

I don't know if the standard Kafka producer that ships with Kafka supports
buffering in RAM in an emergency. We wrote our own that does, with a focus
on speed and simplicity, but I expect it will very rarely, if ever, buffer
in RAM.

Building and using semi-reliable system after semi-reliable system, and
chaining them all together, hoping to be more tolerant of failure is not
necessarily a good approach. Instead, identifying that one system that is
critical, and ensuring that it remains up (redundant installations,
redundant disks, redundant network connections etc) is a better approach

On Fri, Apr 12, 2013 at 7:54 AM, Jun Rao <[EMAIL PROTECTED]> wrote:

> Another way to handle this is to provision enough client and broker servers
> so that the peak load can be handled without spooling.
> Thanks,
> Jun
> On Thu, Apr 11, 2013 at 5:45 PM, Piotr Kozikowski <[EMAIL PROTECTED]
> >wrote:
> > Jun,
> >
> > When talking about "catastrophic consequences" I was actually only
> > referring to the producer side. in our use case (logging requests from
> > webapp servers), a spike in traffic would force us to either tolerate a
> > dramatic increase in the response time, or drop messages, both of which
> are
> > really undesirable. Hence the need to absorb spikes with some system on
> top
> > of Kafka, unless the spooling feature mentioned by Wing (
> > https://issues.apache.org/jira/browse/KAFKA-156) is implemented. This is
> > assuming there are a lot more producer machines than broker nodes, so
> each
> > producer would absorb a small part of the extra load from the spike.
> >
> > Piotr
> >
> > On Wed, Apr 10, 2013 at 10:17 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >
> > > Piotr,
> > >
> > > Actually, could you clarify what "catastrophic consequences" did you
> see
> > on
> > > the broker side? Do clients timeout due to longer serving time or
> > something
> > > else?
> > >
> > > Going forward, we plan to add per client quotas (KAFKA-656) to prevent
> > the
> > > brokers from being overwhelmed by a runaway client.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Hi,
> > > >
> > > > Is there anything one can do to "defend" from:
> > > >
> > > > "Trying to push more data than the brokers can handle for any
> sustained
> > > > period of time has catastrophic consequences, regardless of what