Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - implications of using large number of topics....

Copy link to this message
implications of using large number of topics....
Jason Rosenberg 2012-10-10, 22:57

I'm exploring using kafka for the first time.

I'm contemplating a system where we transmit metric data at regular
intervals to kafka.  One question I have is whether to generate simple
messages with very little meta data (just timestamp and value), and keeping
meta data like the name/host/app that generated metric out of the message,
and have that be embodied in the name of the topic itself instead.
 Alternatively, we could have a relatively small number of topics, which
contain messages which include source meta data along with the timestamp
and metric value in each message.

1. On one hand, we'd have a large number of topics (say several hundred
thousand topics) with small messages, generated at a steady rate (say one
every 10 seconds).

2. Alternatively, we could have just few topics, which receive several
hundred thousand messages every 10 seconds, which contain 2 or 3 times more
data per message.

I'm wondering if kafka has any performance characteristics that differ for
the 2 scenarios.

I like #1 because it simplifies targeted message consumption, and enables
more interesting use of TopicFilter'ing.  But I'm unsure whether there
might be performance concerns with kafka (does it have to do more work to
separately manage each topic?).  Is this a common use case, or not?

Thanks for any insight.