Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Thousands of topics


Copy link to this message
-
Re: Thousands of topics
Lorenzo Alberton 2012-08-07, 21:47
Hi Taylor,

thanks for your reply. I'd love to read your blog post about your
experiences with it, especially around hardware configuration and how you
consume the data (few/many short/long-lived processes, average throughput
per topic). The cleanup script seems really useful too, I was considering
writing one that also cleans dead topics off zookeeper.

Thanks!

Lorenzo
On Tue, Jul 31, 2012 at 8:58 PM, Taylor Gautier <[EMAIL PROTECTED]> wrote:

> Yes, we have done so at Tagged.  I chronicled a bit of our experience here
> on the the mailing list.  Effectively we found that a single machine could
> not go above ~20k total topics.  This could be OS dependent however (we use
> CentOS 5.x)
>
> Various tweaks we made to go further:
>
>    1. a beefed up node.js kafka client/producer implementation -
>    https://github.com/tagged/node-kafka lies at the heart of our kafka
>    deployment
>    2. our own kafka software load balancer (implemented using said library)
>    that shards out independent Kafka instances (guarantees in-order
> delivery
>    per topic and scales the # of kafka topics linearly as a function of
> the #
>    of kafka machines)
>    3. a continuous cleaner that removes old dead topics completely from the
>    filesystem (0.7 cleaner leaves empty directory/file which eats up open
> file
>    handles and limits max # of topics)
>    4. (coming soon) a hierarchical topic directory structure to ease the
>    pain of too main directories/files in a single directory (should help
> the
>    ~20k number, though probably by less than you might imagine)
>
> On our todo list is blogging about this in more detail, and contributing
> back more than just the node.js implementation.
>
> On Mon, Jul 30, 2012 at 8:39 AM, Lorenzo Alberton <[EMAIL PROTECTED]
> >wrote:
>
> > Is there anyone who tried Kafka with thousands of concurrent topics?
> > If so, what are your experiences? How did you tune it?
> >
> > Thanks!
> >
>