Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Analysis of producer performance

Copy link to this message
Re: Analysis of producer performance -- and Producer-Kafka reliability
David Arthur 2013-04-23, 12:22
It seems there are two underlying things here: storing messages to
stable storage, and making messages available to consumers (i.e.,
storing messages on the broker). One can be achieved simply and reliably
by spooling to local disk, the other requires network and is inherently
less reliable. Buffering messages in memory does not help with the first
one since they are in volatile storage, but it does help with the second
one in the event of a network partition.

I could imagine a producer running in "ultra-reliability" mode where it
uses a local log file as a buffer where all messages written to and read
from. One issue with this, though, is that now you have to worry about
the performance and capacity of the disks on your producers (which can
be numerous compared to brokers). As for performance, the data being
written by producers is already in active memory, so writing it to a
disk then doing a zero-copy transfer to the network should be pretty
fast (maybe?).

Or, Kafka can remain more "protocol-ish" and less "application-y" and
just give you errors when brokers are unavailable and let your
application deal with it. This is basically what TCP/HTTP/etc do. HTTP
servers don't say "hold on, there's a problem, let me try that request
again in a second.."

Interesting discussion, btw :)

On 4/15/13 2:18 PM, Piotr Kozikowski wrote:
> Philip,
> We would not use spooling to local disk on the producer to deal with
> problems with the connection to the brokers, but rather to absorb temporary
> spikes in traffic that would overwhelm the brokers. This is assuming that
> 1) those spikes are relatively short, but when they come they require much
> higher throughput than normal (otherwise we'd just have a capacity problem
> and would need more brokers), and 2) the spikes are long enough for just a
> RAM buffer to be dangerous. If the brokers did go down, spooling to disk
> would give us more time to react, but that's not the primary reason for
> wanting the feature.
> -Piotr
> On Fri, Apr 12, 2013 at 8:21 AM, Philip O'Toole <[EMAIL PROTECTED]> wrote:
>> This is just my opinion of course (who else's could it be? :-)) but I think
>> from an engineering point of view, one must spend one's time making the
>> Producer-Kafka connection solid, if it is mission-critical.
>> Kafka is all about getting messages to disk, and assuming your disks are
>> solid (and 0.8 has replication) those messages are safe. To then try to
>> build a system to cope with the Kafka brokers being unavailable seems like
>> you're setting yourself for infinite regress. And to write code in the
>> Producer to spool to disk seems even more pointless. If you're that
>> worried, why not run a dedicated Kafka broker on the same node as the
>> Producer, and connect over localhost? To turn around and write code to
>> spool to disk, because the primary system that *spools to disk* is down
>> seems to be missing the point.
>> That said, even by going over local-host, I guess the network connection
>> could go down. In that case, Producers should buffer in RAM, and start
>> sending some major alerts to the Operations team. But this should almost
>> *never happen*. If it is happening regularly *something is fundamentally
>> wrong with your system design*. Those Producers should also refuse any more
>> incoming traffic and await intervention. Even bringing up "netcat -l" and
>> letting it suck in the data and write it to disk would work then.
>> Alternatives include having Producers connect to a load-balancer with
>> multiple Kafka brokers behind it, which helps you deal with any one Kafka
>> broker failing. Or just have your Producers connect directly to multiple
>> Kafka brokers, and switch over as needed if any one broker goes down.
>> I don't know if the standard Kafka producer that ships with Kafka supports
>> buffering in RAM in an emergency. We wrote our own that does, with a focus
>> on speed and simplicity, but I expect it will very rarely, if ever, buffer