Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> at-least-once guarantee?


+
Yang 2013-08-07, 23:01
+
Jay Kreps 2013-08-07, 23:26
Copy link to this message
-
Re: at-least-once guarantee?
Interesting. .. wouldn't the producer sequence grow without bounds, in the
first case, even with the simpler non-ha of key assumption, to provide a
strict exactly once semantics?

In other words, wouldn't you need to store the entire set of keys that the
broker has ever seen to ensure that a potential replayed message doesn't
make it into the commit; given multiple producers?

In mps (github.com/milindparikh/mps), I use a rotating double bloom filter
to provide a "nearly exactly once" semantics to prevent an without-bound
growth of such a sequence.

Regards
Milind
On Aug 7, 2013 4:26 PM, "Jay Kreps" <[EMAIL PROTECTED]> wrote:

> Yeah I'm not sure how good our understanding was when we wrote that.
>
> Here is my take now:
>
> At least once delivery is not that hard but you need the ability to
> deduplicate things--basically you turn the "at least once delivery channel"
> into the "exactly once channel" by throwing away duplicates. This means
> 1. Some key assigned by the producer that allows the broker to detect a
> re-published message to make publishing idempotent. This solves the problem
> of producer retries. This key obviously has to be highly available--i.e. if
> the leader for a partition fails the follower must correctly deduplicate
> for all committed messages.
> 2. Some key that allows the consumer to detect a re-consumed message.
>
> The first item is actually pretty doable as we can track some producer
> sequence in the log and use it to avoid duplicate appends. We just need to
> implement it. I think this can be done in a way that is fairly low overhead
> and can be "on by default".
>
> We actually already provide such a key to the consumer--the offset. Making
> use of this is actually somewhat application dependent. Obviously providing
> exactly-once guarantees in the case of no failures is easy and we already
> handle that case. The harder part is if a consumer process dies to ensure
> that it restarts in a position that exactly matches the state changes that
> it has made in some destination system. If the consumer application uses
> the offset in a way that makes updates idempotent that will work, or if
> they commit their offset and data atomically that works. However in general
> the goal of a consumer is to produce some state change in another system (a
> db, hdfs, some other data system, etc) and having a general solution that
> works with all of these is hard since they have very different limitations
> and features.
>
> -Jay
>
>
> On Wed, Aug 7, 2013 at 4:00 PM, Yang <[EMAIL PROTECTED]> wrote:
>
> > I wonder why at-least-once guarantee is easier to maintain than
> > exactly-once (in that the latter requires 2PC while the former does not ,
> > according to
> >
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> >  )
> >
> > if u achieve at-least-once guarantee, you are able to assert between 2
> > cases "nothing" vs ">=1 delivered", which can be seen as 2 different
> > answers 0 and 1. isn't this as hard as the common Byzantine general
> > problem?
> >
> > Thanks
> > Yang
> >
>

 
+
Jay Kreps 2013-08-08, 02:56
+
Yang 2013-09-04, 06:07
+
Niek Sanders 2013-08-07, 23:28
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB