Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Number of feeds, how does it scale?


Copy link to this message
-
Number of feeds, how does it scale?
Hi guys,

I'm wondering about experiences with a large number of feeds created
and managed on a single Kafka cluster.  Specifically, if anyone can
share information about how many different feeds they have on their
kafka cluster and overall throughput, that'd be cool.

Some background: I'm planning on setting up a system around Kafka that
will (hopefully, eventually) have >10,000 feeds in parallel.  I expect
event volume on these feeds to follow a zipfian distribution.  So,
there will be a long-tail of smaller feeds and some large ones, but
there will be consumers for each of these feeds.  I'm trying to decide
between relying on Kafka's feeds to maintain the separation between
the data streams, or if I should actually create one large aggregate
feed and utilize Kafka's partitioning mechanisms along with some
custom logic to keep the feeds separated.  I prefer to use Kafka's
built-in feed mechanisms, cause there are significant benefits to
that, but I can also imagine a world where that many feeds was not in
the base assumptions of how the system would be used and thus
questionable around performance.

Any input is appreciated.

--Eric
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB