thanks for your reply. I'd love to read your blog post about your
experiences with it, especially around hardware configuration and how you
consume the data (few/many short/long-lived processes, average throughput
per topic). The cleanup script seems really useful too, I was considering
writing one that also cleans dead topics off zookeeper.
On Tue, Jul 31, 2012 at 8:58 PM, Taylor Gautier <[EMAIL PROTECTED]> wrote:
> Yes, we have done so at Tagged. I chronicled a bit of our experience here
> on the the mailing list. Effectively we found that a single machine could
> not go above ~20k total topics. This could be OS dependent however (we use
> CentOS 5.x)
> Various tweaks we made to go further:
> 1. a beefed up node.js kafka client/producer implementation -
> https://github.com/tagged/node-kafka lies at the heart of our kafka
> 2. our own kafka software load balancer (implemented using said library)
> that shards out independent Kafka instances (guarantees in-order
> per topic and scales the # of kafka topics linearly as a function of
> the #
> of kafka machines)
> 3. a continuous cleaner that removes old dead topics completely from the
> filesystem (0.7 cleaner leaves empty directory/file which eats up open
> handles and limits max # of topics)
> 4. (coming soon) a hierarchical topic directory structure to ease the
> pain of too main directories/files in a single directory (should help
> ~20k number, though probably by less than you might imagine)
> On our todo list is blogging about this in more detail, and contributing
> back more than just the node.js implementation.
> On Mon, Jul 30, 2012 at 8:39 AM, Lorenzo Alberton <[EMAIL PROTECTED]
> > Is there anyone who tried Kafka with thousands of concurrent topics?
> > If so, what are your experiences? How did you tune it?
> > Thanks!