Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Arguments for Kafka over RabbitMQ ?


Copy link to this message
-
Re: Arguments for Kafka over RabbitMQ ?
Actually you don't need 100s GBs to reap the benefits of Kafka over Rabbit.
 Because Kafka doesn't centrally maintain state it can always manage higher
message throughput more efficiently than Rabbit even when there is no
messages persisted to disk.

However Kafka’s throughput advantage increases dramatically anytime Rabbit
needs to spill messages to disk due to random I/O.  This is shown at the
bottom in the ‘Large queues’ section of performance benchmarks Alexis
shared previously,
http://www.rabbitmq.com/blog/2012/04/25/rabbitmq-performance-measurements-part-2/.
As you can see in the last graph of 10 million messages which is less than
a GB on disk, the Rabbit throughput is capped around 10k/sec.  Beyond
throughput, with the pending release of 0.8, Kafka will also have
advantages around message guarantees and durability.

On Sun, Jun 9, 2013 at 9:58 AM, Mark <[EMAIL PROTECTED]> wrote:

> What if you messaging requirements are in the 100's GBSs? Would you say
> RabbitMQ is probably a better fit?
>
> On Jun 8, 2013, at 4:03 PM, Jonathan Hodges <[EMAIL PROTECTED]> wrote:
>
> > I am not making any assumptions other than Rabbit needs to maintain the
> > state of the consumers.  As the Kafka docs point out this is the
> > fundamental difference between most providers in the space and Kafka.
> >
> > Thinking of a high throughput stream of messages and many active
> consumers
> > of different speeds, I am struggling with how Rabbit can avoid random I/O
> > with all the acks.  Each consumer’s state is certainly not linearly
> stored
> > on disk so there would have to be seeks.  Further log-structured merge
> > trees are used in NoSQL stores like Cassandra and are optimized for
> random
> > read access.  Why do you feel ‘Rabbit does not do lots of random IO?’
> >
> > Looking at some docs on the Rabbit site they seem to mention that
> > performance degrades as the size of the persistent message store
> increases.
> > Too much random I/O could certainly explain this degradation.
> >
> > http://www.rabbitmq.com/blog/2011/09/24/sizing-your-rabbits/
> >
> > The use case I have been talking about all along is a continuous firehose
> > of data with throughput in the 100s of thousands messages per second.
> You
> > will have 10-20 consumers of different speeds ranging from real-time
> > (Storm) to batch (Hadoop).  This means the message store is in the 100s
> GBs
> > to terabytes range at all times.
> >
> > -Jonathan
> >
> >
> >
> > On Sat, Jun 8, 2013 at 2:09 PM, Alexis Richardson <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Jonathan
> >>
> >> I am aware of the difference between sequential writes and other kinds
> >> of writes ;p)
> >>
> >> AFAIK the Kafka docs describe a sort of platonic alternative system,
> >> eg "normally people do this.. Kafka does that..".  This is a good way
> >> to explain design decisions.  However, I think you may be assuming
> >> that Rabbit is a lot like the generalised other system.  But it is not
> >> - eg Rabbit does not do lots of random IO.  I'm led to understand that
> >> Rabbit's msg store is closer to log structured storage (a la
> >> Log-Structured Merge Trees) in some ways.  However, Rabbit does do
> >> more synchronous I/O, and has a different caching strategy (AFAIK).
> >> "It's complicated"
> >>
> >> In order to help provide useful info to the community, please could
> >> you describe a concrete test that we could discuss?  I think that
> >> would really help.  You mentioned a scenario with one large data set
> >> being streamed into the broker(s), and then consumed (in full?) by 2+
> >> consumers of wildly varying speeds.  Could you elaborate please?
> >>
> >> alexis
> >>
> >>
> >> Also, this is probably OT but I have never grokked this in the Design
> Doc:
> >>
> >> "Consumer rebalancing is triggered on each addition or removal of both
> >> broker nodes and other consumers within the same group. For a given
> >> topic and a given consumer group, broker partitions are divided evenly