Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Arguments for Kafka over RabbitMQ ?


Copy link to this message
-
Re: Arguments for Kafka over RabbitMQ ?
Hi Alexis,

I appreciate your reply and clarifications to my misconception about
Rabbit, particularly on the copying of the message payloads per consumer.
 It sounds like it only copies metadata like the consumer state i.e.
position in the topic messages.  I don’t have experience with Rabbit and
was basing this assumption based on Google searches like the following -
http://ilearnstack.com/2013/04/16/introduction-to-amqp-messaging-with-rabbitmq/.
 It seems to indicate with topic exchanges that the messages get copied to
a queue per consumer, but I am glad you confirmed it is just the metadata.

While you are correct the payload is a much bigger concern, managing the
metadata and acks centrally on the broker across multiple clients at scale
is also a concern.  This would seem to be exasperated if you have consumers
at different speeds i.e. Storm and Hadoop consuming the same topic.

In that scenario, say storm consumes the topic messages in real-time and
Hadoop consumes once a day.  Let’s assume the topic consists of 100k+
messages/sec throughput so that in a given day you might have 100s GBs of
data flowing through the topic.

To allow Hadoop to consume once a day, Rabbit obviously can’t keep 100s GBs
in memory and will need to persist this data to its internal DB to be
retrieved later.  I believe when large amounts of data need to be persisted
is the scenario described in the earlier posted Kafka paper (
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf)
where Rabbit’s performance really starts to bog down as compared to Kafka.

This Kafka paper is looks to be a few years old so has something changed
within the Rabbit architecture to alleviate this issue when large amounts
of data are persisted to the internal DB?  Do the producer and consumer
numbers look correct?  If no, maybe you can share some Rabbit benchmarks
under this scenario, because I believe it is the main area where Kafka
appears to be the superior solution.

Thanks for educating me on these matters.

-Jonathan

On Fri, Jun 7, 2013 at 6:54 AM, Alexis Richardson <[EMAIL PROTECTED]>wrote:

> Hi
>
> Alexis from Rabbit here.  I hope I am not intruding!
>
> It would be super helpful if people with questions, observations or
> moans posted them to the rabbitmq list too :-)
>
> A few comments:
>
> * Along with ZeroMQ, I consider Kafka to be one of the interesting and
> useful messaging projects out there.  In a world of cruft, Kafka is
> cool!
>
> * This is because both projects come at messaging from a specific
> point of view that is *different* from Rabbit.  OTOH, many other
> projects exist that replicate Rabbit features for fun, or NIH, or due
> to misunderstanding the semantics (yes, our docs could be better)
>
> * It is striking how few people describe those differences.  In a
> nutshell they are as follows:
>
> *** Kafka writes all incoming data to disk immediately, and then
> figures out who sees what.  So it is much more like a database than
> Rabbit, in that new consumers can appear well after the disk write and
> still subscribe to past messages.  Instead, Rabbit which tries to
> deliver to consumers and buffers otherwise.  Persistence is optional
> but robust and a feature of the buffer ("queue") not the upstream
> machinery.  Rabbit is able to cache-on-arrival via a plugin, but this
> is a design overlay and not particularly optimal.
>
> *** Kafka is a client server system with end to end semantics.  It
> defines order to include processing order, and keeps state on the
> client to do this.  Group management is via a 3rd party service
> (Zookeeper? I forget which).  Rabbit is a server-only protocol based
> system which maintains order on the server and through completely
> language neutral protocol semantics.  This makes Rabbit perhaps more
> natural as a 'messaging service' eg for integration and other
> inter-app data transfer.
>
> *** Rabbit is a general purpose messaging system with extras like
> federation.  It speaks many protocols, and has core features like HA,

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB