I'm wondering about experiences with a large number of feeds created
and managed on a single Kafka cluster. Specifically, if anyone can
share information about how many different feeds they have on their
kafka cluster and overall throughput, that'd be cool.
Some background: I'm planning on setting up a system around Kafka that
will (hopefully, eventually) have >10,000 feeds in parallel. I expect
event volume on these feeds to follow a zipfian distribution. So,
there will be a long-tail of smaller feeds and some large ones, but
there will be consumers for each of these feeds. I'm trying to decide
between relying on Kafka's feeds to maintain the separation between
the data streams, or if I should actually create one large aggregate
feed and utilize Kafka's partitioning mechanisms along with some
custom logic to keep the feeds separated. I prefer to use Kafka's
built-in feed mechanisms, cause there are significant benefits to
that, but I can also imagine a world where that many feeds was not in
the base assumptions of how the system would be used and thus
questionable around performance.
Any input is appreciated.
Taylor Gautier 2012-04-09, 19:11
Neha Narkhede 2012-04-09, 19:53
Taylor Gautier 2012-04-09, 20:42
Jay Kreps 2012-04-09, 20:55