Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> resilient producer


Copy link to this message
-
Re: resilient producer
Jay,

Thanks for your insight!   More comments are below.

On Tue, Jan 15, 2013 at 3:18 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> I can't speak for all users, but at LinkedIn we don't do this. We just run
> Kafka as a high-availability system (i.e. something not allowed to be
> down). These kind of systems require more care, but we already have a
> number of such data systems. We chose this approach because local queuing
> leads to disk/data management problems on all producers (and we have
> thousands) and also late data. Late data makes aggregation very hard since
> there will always be more data coming so the aggregate ends up not matching
> the base data.
>

Yep, we're facing the same problem with respect to late data.  I'd like to
see alternative solutions to this problem, but I am afraid it's an
undecidable problem in general.
> This has lead us to a path of working on reliability of the service itself
> rather than a store-and-forward model.
>
Likewise the model itself doesn't necessarily work--as you get to thousands
> of producers, then some of those will likely go hard down if the cluster
> has non-trivial periods of non-availability, and the data you queued
> locally is gone since you have no fault-tolerance for that.
>

Right.  So, you're essentially trading late data for potentially lost data?

> So that was our rationale, but you could easily go the other way. There is
> nothing in kafka that prevents producer-side queueing. I could imagine two
> possible implementations:
> 1. Many people who want this are basically doing log aggregation. If this
> is the case the collector process on the machine would just pause its
> collecting if the cluster is unavailable.
> 2. Alternately it would be possible to embed the kafka log (which is a
> standalone system) in the producer and use it for journalling in the case
> of errors. Then there could be a background thread that tries to push these
> stored messages out.
> 3. One could just catch any exceptions the producer throws and implement
> (2) external to the Kafka client.
>

Option 2 sounds promising.

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB