On Sat, Jun 8, 2013 at 2:09 AM, Jonathan Hodges <[EMAIL PROTECTED]> wrote:

Happy to help!

I'm afraid I don't follow your arguments below.  Rabbit contains many
optimisations too.  I'm told that it is possible to saturate the disk
i/o, and you saw the message rates I quoted in the previous email.
YES of course there are differences, mostly an accumulation of things.
 For example Rabbit spends more time doing work before it writes to

It would be great if you can you detail some of the optimizations?  It
would seem to me Rabbit has much more overhead due to maintaining state of
the consumers as well as general messaging processing which makes it
impossible to manage the same write throughput as Kafka when you need to
persist large amounts of data to disk.  I definitely believe you that
Rabbit can saturate the disk but it is much more seek centric i.e. random
access read/writes vs sequential read/writes.  Kafka saturates the disk
too, but since it leverages sequential disk I/O is orders of magnitude more
efficient persisting to disk than random access.
You said:

"Since Rabbit must maintain the state of the
consumers I imagine it’s subjected to random data access patterns on disk
as opposed to sequential."

I don't follow the logic here, sorry.

Couple of side comments:

* In your Hadoop vs RT example, Rabbit would deliver the RT messages
immediately and write the rest to disk.  It can do this at high rates
- I shall try to get you some useful data here.

* Bear in mind that write speed should be orthogonal to read speed.
Ask yourself - how would Kafka provide a read cache, and when might
that be useful?

* I'll find out what data structure Rabbit uses for long term persistence.

What I am saying here is when Rabbit needs to retrieve and persist each
consumer’s state from its internal DB this information isn’t linearly
persisted on disk so it requires disk seeks which is in much less
inefficient than sequential access.  You do get the difference here,
correct?  Sequential reads from disk are nearly 1.5x faster than random
reads from memory and 4-5 orders of magnitude faster than random reads from
disk (http://queue.acm.org/detail.cfm?id=1563874).

As was detailed at length in my previous post Kafka uses the OS
pagecache/sendfile which is much more efficient than memory or applications

That would be awesome if you can confirm what Rabbit is using as a
persistent data structure.  More importantly, whether it is BTree or
something else, is the disk i/o random or linear?
"Quoting the Kafka design page (
http://kafka.apache.org/07/design.html) performance of sequential writes on
a 6 7200rpm SATA RAID-5 array is about 300MB/sec but the performance of
random writes is only about 50k/sec—a difference of nearly 10000X."

Depending on your use case, I'd expect 2x-10x overall throughput
differences, and will try to find out more info.  As I said, Rabbit
can saturate disk i/o.

This is only speaking of the use case of high throughput with persisting
large amounts of data to disk where there is 4 orders of magnitude more
than 10x difference.  It all comes down to random vs sequential
writes/reads to disk as I mentioned above.
On Sat, Jun 8, 2013 at 2:07 AM, Alexis Richardson <
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB