I am not making any assumptions other than Rabbit needs to maintain the
state of the consumers. As the Kafka docs point out this is the
fundamental difference between most providers in the space and Kafka.
Thinking of a high throughput stream of messages and many active consumers
of different speeds, I am struggling with how Rabbit can avoid random I/O
with all the acks. Each consumer’s state is certainly not linearly stored
on disk so there would have to be seeks. Further log-structured merge
trees are used in NoSQL stores like Cassandra and are optimized for random
read access. Why do you feel ‘Rabbit does not do lots of random IO?’
Looking at some docs on the Rabbit site they seem to mention that
performance degrades as the size of the persistent message store increases.
Too much random I/O could certainly explain this degradation.http://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/
The use case I have been talking about all along is a continuous firehose
of data with throughput in the 100s of thousands messages per second. You
will have 10-20 consumers of different speeds ranging from real-time
(Storm) to batch (Hadoop). This means the message store is in the 100s GBs
to terabytes range at all times.
On Sat, Jun 8, 2013 at 2:09 PM, Alexis Richardson <
[EMAIL PROTECTED]> wrote: