I understand that there are no guarantees per say that a message may be a duplicate (its the consumer's job to guarantee that), but when it comes to message order, is kafka built in such a way that it is impossible to get messages in the wrong order?
Certain use cases might not be sensitive to order, but when order is very important, is kafka the wrong tool for the job or is there a way to get this requirement?
Simple example of how to take advantage of this behavior:
Suppose you're sending document updates through Kafka. If you use the document ID as the message key and the default hash partitioner, the updates for a given document will exist on the same partition and come into the consumer in order.
Another idea. If a set of messages arrive over a single TCP connection, route to a partition depending on TCP connection.
To be honest, these approaches, while they work, may not scale when the message rate is high. If at all possible, try to think of a way to remove this requirement from your design. For example, a design might have a sequence number assigned to each message before it goes into Kafka (a time-based UUID, for example), and something later in the pipe line sorts it all out. Kafka then does what is does best, IMHO, a high-performance reliable, message bus.
On Jun 14, 2013, at 7:37 AM, David Arthur <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext