Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Arguments for Kafka over RabbitMQ ?


Copy link to this message
-
Re: Arguments for Kafka over RabbitMQ ?
Jonathan Hodges 2013-06-08, 01:09
Thanks so much for your replies.  This has been a great help understanding
Rabbit better with having very little experience with it.  I have a few
follow up comments below.
> While you are correct the payload is a much bigger concern, managing the
> metadata and acks centrally on the broker across multiple clients at scale
> is also a concern.  This would seem to be exasperated if you have
consumers
> at different speeds i.e. Storm and Hadoop consuming the same topic.
>
> In that scenario, say storm consumes the topic messages in real-time and
> Hadoop consumes once a day.  Let’s assume the topic consists of 100k+
> messages/sec throughput so that in a given day you might have 100s GBs of
> data flowing through the topic.
>
> To allow Hadoop to consume once a day, Rabbit obviously can’t keep 100s
GBs
> in memory and will need to persist this data to its internal DB to be
> retrieved later.

I am not sure why you think this is a problem?

For a fixed number of producers and consumers, the pubsub and delivery
semantics of Rabbit and Kafka are quite similar.  Think of Rabbit as
adding an in-memory cache that is used to (a) speed up read
consumption, (b) obviate disk writes when possible due to all client
consumers being available and consuming.
Actually I think this is the main use case that sets Kafka apart from
Rabbit and speaks to the poster’s ‘Arguments for Kafka over RabbitMQ’
question.  As you mentioned Rabbit is a general purpose messaging system
and along with that has a lot of features not found in Kafka.  There are
plenty of times when Rabbit makes more sense than Kafka, but not when you
are maintaining large message stores and require high throughput to disk.

Persisting 100s GBs of messages to disk is a much different problem than
managing messages in memory.  Since Rabbit must maintain the state of the
consumers I imagine it’s subjected to random data access patterns on disk
as opposed to sequential.  Quoting the Kafka design page (
http://kafka.apache.org/07/design.html) performance of sequential writes on
a 6 7200rpm SATA RAID-5 array is about 300MB/sec but the performance of
random writes is only about 50k/sec—a difference of nearly 10000X.

They go on to say persistent data structure used in messaging systems
metadata is often a BTree. BTrees are the most versatile data structure
available, and make it possible to support a wide variety of transactional
and non-transactional semantics in the messaging system. They do come with
a fairly high cost, though: Btree operations are O(log N). Normally O(log
N) is considered essentially equivalent to constant time, but this is not
true for disk operations. Disk seeks come at 10 ms a pop, and each disk can
do only one seek at a time so parallelism is limited. Hence even a handful
of disk seeks leads to very high overhead. Since storage systems mix very
fast cached operations with actual physical disk operations, the observed
performance of tree structures is often superlinear. Furthermore BTrees
require a very sophisticated page or row locking implementation to avoid
locking the entire tree on each operation. The implementation must pay a
fairly high price for row-locking or else effectively serialize all reads.
Because of the heavy reliance on disk seeks it is not possible to
effectively take advantage of the improvements in drive density, and one is
forced to use small (< 100GB) high RPM SAS drives to maintain a sane ratio
of data to seek capacity.

Intuitively a persistent queue could be built on simple reads and appends
to files as is commonly the case with logging solutions. Though this
structure would not support the rich semantics of a BTree implementation,
but it has the advantage that all operations are O(1) and reads do not
block writes or each other. This has obvious performance advantages since
the performance is completely decoupled from the data size--one server can
now take full advantage of a number of cheap, low-rotational speed 1+TB
SATA drives. Though they have poor seek performance, these drives often
have comparable performance for large reads and writes at 1/3 the price and
3x the capacity.

Having access to virtually unlimited disk space without penalty means that
we can provide some features not usually found in a messaging system. For
example, in kafka, instead of deleting a message immediately after
consumption, we can retain messages for a relative long period (say a week).

Our assumption is that the volume of messages is extremely high, indeed it
is some multiple of the total number of page views for the site (since a
page view is one of the activities we process). Furthermore we assume each
message published is read at least once (and often multiple times), hence
we optimize for consumption rather than production.

There are two common causes of inefficiency: too many network requests, and
excessive byte copying.

To encourage efficiency, the APIs are built around a "message set"
abstraction that naturally groups messages. This allows network requests to
group messages together and amortize the overhead of the network roundtrip
rather than sending a single message at a time.

The MessageSet implementation is itself a very thin API that wraps a byte
array or file. Hence there is no separate serialization or deserialization
step required for message processing, message fields are lazily
deserialized as needed (or not deserialized if not needed).

The message log maintained by the broker is itself just a directory of
message sets that have been written to disk. This abstraction allows a
single byte format to be shared by both the broker and the consumer (and to
some degree the producer, though producer messages are checksumed and
validated before being added to the log).

Maintaining this common format allows optimization of the most important
operation: network transfer of persistent log chunks. Modern unix operating
systems offer a highly optimized code path for transferring data out of
pagecache to a