Kafka, mail # user - Re: Partitioning and scale - 2013-05-24, 15:40
 Search Hadoop and all its subprojects:

Switch to Plain View
Timothy Chen 2013-05-22, 19:26
Chris Curtin 2013-05-22, 19:37
Neha Narkhede 2013-05-22, 20:15
Timothy Chen 2013-05-22, 21:20
Neha Narkhede 2013-05-22, 23:32
Timothy Chen 2013-05-23, 23:22
Milind Parikh 2013-05-23, 23:36
Copy link to this message
Re: Partitioning and scale

Kafka is not designed to support millions of topics. Zookeeper will become
a bottleneck, even if you deploy more brokers to get around the # of files
issue. In normal cases, it might work just fine with the right sized
cluster. However, when there are failures, the time to recovery could be a
few minutes instead of a few 100s of ms or few seconds.

Also, even if you go with one topic per session id approach, it will be
unscalable when the key space increases in the future. A more scalable
approach is what Milind described. Use the sticky partitioning feature in
08 and have a session id always get routed to a particular partition. Then
each of your consumers can be sure that they will always receive data for a
subset of the session ids. So you can do any locality sensitive processing
on your consumers for processing user sessions.

On Thu, May 23, 2013 at 4:36 PM, Milind Parikh <[EMAIL PROTECTED]>wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB