Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Arguments for Kafka over RabbitMQ ?


Copy link to this message
-
Re: Arguments for Kafka over RabbitMQ ?
Sorry I forgot to add this RabbitMQ link as well as it also seems to
indicate the copying of messages to multiple queues with a topic exchange.

http://www.rabbitmq.com/tutorials/tutorial-five-python.html

Maybe I am misunderstanding and topic exchange wouldn't be the approach to
take with Rabbit if you wanted to share the same stream of messages across
multiple consumers.
On Fri, Jun 7, 2013 at 12:03 PM, Jonathan Hodges <[EMAIL PROTECTED]> wrote:

> Hi Alexis,
>
> I appreciate your reply and clarifications to my misconception about
> Rabbit, particularly on the copying of the message payloads per consumer.
>  It sounds like it only copies metadata like the consumer state i.e.
> position in the topic messages.  I don’t have experience with Rabbit and
> was basing this assumption based on Google searches like the following -
> http://ilearnstack.com/2013/04/16/introduction-to-amqp-messaging-with-rabbitmq/.
>  It seems to indicate with topic exchanges that the messages get copied to
> a queue per consumer, but I am glad you confirmed it is just the metadata.
>
> While you are correct the payload is a much bigger concern, managing the
> metadata and acks centrally on the broker across multiple clients at scale
> is also a concern.  This would seem to be exasperated if you have consumers
> at different speeds i.e. Storm and Hadoop consuming the same topic.
>
> In that scenario, say storm consumes the topic messages in real-time and
> Hadoop consumes once a day.  Let’s assume the topic consists of 100k+
> messages/sec throughput so that in a given day you might have 100s GBs of
> data flowing through the topic.
>
> To allow Hadoop to consume once a day, Rabbit obviously can’t keep 100s
> GBs in memory and will need to persist this data to its internal DB to be
> retrieved later.  I believe when large amounts of data need to be persisted
> is the scenario described in the earlier posted Kafka paper (
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf)
> where Rabbit’s performance really starts to bog down as compared to Kafka.
>
> This Kafka paper is looks to be a few years old so has something changed
> within the Rabbit architecture to alleviate this issue when large amounts
> of data are persisted to the internal DB?  Do the producer and consumer
> numbers look correct?  If no, maybe you can share some Rabbit benchmarks
> under this scenario, because I believe it is the main area where Kafka
> appears to be the superior solution.
>
> Thanks for educating me on these matters.
>
> -Jonathan
>
>
>
> On Fri, Jun 7, 2013 at 6:54 AM, Alexis Richardson <[EMAIL PROTECTED]>wrote:
>
>> Hi
>>
>> Alexis from Rabbit here.  I hope I am not intruding!
>>
>> It would be super helpful if people with questions, observations or
>> moans posted them to the rabbitmq list too :-)
>>
>> A few comments:
>>
>> * Along with ZeroMQ, I consider Kafka to be one of the interesting and
>> useful messaging projects out there.  In a world of cruft, Kafka is
>> cool!
>>
>> * This is because both projects come at messaging from a specific
>> point of view that is *different* from Rabbit.  OTOH, many other
>> projects exist that replicate Rabbit features for fun, or NIH, or due
>> to misunderstanding the semantics (yes, our docs could be better)
>>
>> * It is striking how few people describe those differences.  In a
>> nutshell they are as follows:
>>
>> *** Kafka writes all incoming data to disk immediately, and then
>> figures out who sees what.  So it is much more like a database than
>> Rabbit, in that new consumers can appear well after the disk write and
>> still subscribe to past messages.  Instead, Rabbit which tries to
>> deliver to consumers and buffers otherwise.  Persistence is optional
>> but robust and a feature of the buffer ("queue") not the upstream
>> machinery.  Rabbit is able to cache-on-arrival via a plugin, but this
>> is a design overlay and not particularly optimal.
>>
>> *** Kafka is a client server system with end to end semantics.  It

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB