Not if the system does ordered delivery and the id is a sequentially
increasing integer (mod something).

Essentially you need to keep
  (producer_id, topic, partition, max_seq_num)
on each broker. That is, rather than storing all keys you just need to know
the max you have seen if the seq_num is < max_seq_num+1 you have a
duplicate delivery. You can think of this as a persistent form of TCP's
sequence numbers, but instead of lasting only with the transient connection
it is a permanent part of the topic which is replicated with the message.

There are still a number of questions: How do producers get their ids? How
does the client initialize its sequence number? How do you keep the number
of producer ids bounded? etc.

If we did this it would be a ways out still. It would likely be an opt-in
client feature...i.e. clients that don't support it will work as normal. I
will try to get a more detailed design up in the next few weeks just so
there is something more concrete that tries to work out all the issues.

On Wed, Aug 7, 2013 at 5:50 PM, Milind Parikh <[EMAIL PROTECTED]>wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB