Kafka is not designed to support millions of topics. Zookeeper will become
a bottleneck, even if you deploy more brokers to get around the # of files
issue. In normal cases, it might work just fine with the right sized
cluster. However, when there are failures, the time to recovery could be a
few minutes instead of a few 100s of ms or few seconds.
Also, even if you go with one topic per session id approach, it will be
unscalable when the key space increases in the future. A more scalable
approach is what Milind described. Use the sticky partitioning feature in
08 and have a session id always get routed to a particular partition. Then
each of your consumers can be sure that they will always receive data for a
subset of the session ids. So you can do any locality sensitive processing
on your consumers for processing user sessions.
On Thu, May 23, 2013 at 4:36 PM, Milind Parikh <[EMAIL PROTECTED]>wrote: