Home | About | Sematext search-lucene.com search-hadoop.com search-devops.com metrics + logs = try SPM and Logsene for free
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Arguments for Kafka over RabbitMQ ?


Copy link to this message
-
Re: Arguments for Kafka over RabbitMQ ?
Hi Alexis,

I appreciate your reply and clarifications to my misconception about
Rabbit, particularly on the copying of the message payloads per consumer.
 It sounds like it only copies metadata like the consumer state i.e.
position in the topic messages.  I don’t have experience with Rabbit and
was basing this assumption based on Google searches like the following -
http://ilearnstack.com/2013/04/16/introduction-to-amqp-messaging-with-rabbitmq/.
 It seems to indicate with topic exchanges that the messages get copied to
a queue per consumer, but I am glad you confirmed it is just the metadata.

While you are correct the payload is a much bigger concern, managing the
metadata and acks centrally on the broker across multiple clients at scale
is also a concern.  This would seem to be exasperated if you have consumers
at different speeds i.e. Storm and Hadoop consuming the same topic.

In that scenario, say storm consumes the topic messages in real-time and
Hadoop consumes once a day.  Let’s assume the topic consists of 100k+
messages/sec throughput so that in a given day you might have 100s GBs of
data flowing through the topic.

To allow Hadoop to consume once a day, Rabbit obviously can’t keep 100s GBs
in memory and will need to persist this data to its internal DB to be
retrieved later.  I believe when large amounts of data need to be persisted
is the scenario described in the earlier posted Kafka paper (
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf)
where Rabbit’s performance really starts to bog down as compared to Kafka.

This Kafka paper is looks to be a few years old so has something changed
within the Rabbit architecture to alleviate this issue when large amounts
of data are persisted to the internal DB?  Do the producer and consumer
numbers look correct?  If no, maybe you can share some Rabbit benchmarks
under this scenario, because I believe it is the main area where Kafka
appears to be the superior solution.

Thanks for educating me on these matters.

-Jonathan

On Fri, Jun 7, 2013 at 6:54 AM, Alexis Richardson <[EMAIL PROTECTED]>wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB