Stan Rosenberg 2013-01-15, 19:29
Corbin Hoenes 2013-01-15, 20:13
Stan Rosenberg 2013-01-15, 22:32
Jay Kreps 2013-01-15, 20:19
-Re: resilient producer
Stan Rosenberg 2013-01-15, 22:30
Thanks for your insight! More comments are below.
On Tue, Jan 15, 2013 at 3:18 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> I can't speak for all users, but at LinkedIn we don't do this. We just run
> Kafka as a high-availability system (i.e. something not allowed to be
> down). These kind of systems require more care, but we already have a
> number of such data systems. We chose this approach because local queuing
> leads to disk/data management problems on all producers (and we have
> thousands) and also late data. Late data makes aggregation very hard since
> there will always be more data coming so the aggregate ends up not matching
> the base data.
Yep, we're facing the same problem with respect to late data. I'd like to
see alternative solutions to this problem, but I am afraid it's an
undecidable problem in general.
> This has lead us to a path of working on reliability of the service itself
> rather than a store-and-forward model.
Likewise the model itself doesn't necessarily work--as you get to thousands
> of producers, then some of those will likely go hard down if the cluster
> has non-trivial periods of non-availability, and the data you queued
> locally is gone since you have no fault-tolerance for that.
Right. So, you're essentially trading late data for potentially lost data?
> So that was our rationale, but you could easily go the other way. There is
> nothing in kafka that prevents producer-side queueing. I could imagine two
> possible implementations:
> 1. Many people who want this are basically doing log aggregation. If this
> is the case the collector process on the machine would just pause its
> collecting if the cluster is unavailable.
> 2. Alternately it would be possible to embed the kafka log (which is a
> standalone system) in the producer and use it for journalling in the case
> of errors. Then there could be a background thread that tries to push these
> stored messages out.
> 3. One could just catch any exceptions the producer throws and implement
> (2) external to the Kafka client.
Option 2 sounds promising.