Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> implications of using large number of topics....


+
Jason Rosenberg 2012-10-10, 22:57
+
Neha Narkhede 2012-10-10, 23:05
Copy link to this message
-
Re: implications of using large number of topics....
Ok,

Perhaps for the sake of argument, consider the question if we have just 1
kafka broker.  It sounds like it will need to keep a file handle open for
each topic?  Is that right?

Jason

On Wed, Oct 10, 2012 at 4:05 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> Hi Jason,
>
> We use option #2 at LinkedIn for metrics and tracking data. Supporting
> Option #1 in Kafka 0.7 has its challenges since every topic is stored
> on every broker, by design. Hence, the number of topics a cluster can
> support is limited by the IO and number of open file handles on each
> broker. After Kafka 0.8 is released, the distribution of topics to
> brokers is user defined and can scale out with the number of brokers.
> Having said that, some Kafka users have successfully deployed Kafka
> 0.7 clusters hosting very high number of topics. I hope they can share
> their experiences here.
>
> Thanks,
> Neha
>
> On Wed, Oct 10, 2012 at 3:57 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I'm exploring using kafka for the first time.
> >
> > I'm contemplating a system where we transmit metric data at regular
> > intervals to kafka.  One question I have is whether to generate simple
> > messages with very little meta data (just timestamp and value), and
> keeping
> > meta data like the name/host/app that generated metric out of the
> message,
> > and have that be embodied in the name of the topic itself instead.
> >  Alternatively, we could have a relatively small number of topics, which
> > contain messages which include source meta data along with the timestamp
> > and metric value in each message.
> >
> > 1. On one hand, we'd have a large number of topics (say several hundred
> > thousand topics) with small messages, generated at a steady rate (say one
> > every 10 seconds).
> >
> > 2. Alternatively, we could have just few topics, which receive several
> > hundred thousand messages every 10 seconds, which contain 2 or 3 times
> more
> > data per message.
> >
> > I'm wondering if kafka has any performance characteristics that differ
> for
> > the 2 scenarios.
> >
> > I like #1 because it simplifies targeted message consumption, and enables
> > more interesting use of TopicFilter'ing.  But I'm unsure whether there
> > might be performance concerns with kafka (does it have to do more work to
> > separately manage each topic?).  Is this a common use case, or not?
> >
> > Thanks for any insight.
> >
> > Jason
>
+
Jay Kreps 2012-10-10, 23:25
+
Taylor Gautier 2012-10-11, 03:13
+
Mathias Söderberg 2012-10-11, 13:43
+
Jun Rao 2012-10-12, 05:48
+
Jason Rosenberg 2012-10-12, 17:55
+
Jun Rao 2012-10-14, 03:37
+
Jason Rosenberg 2012-10-14, 06:42
+
Jun Rao 2012-10-15, 18:42
+
Jason Rosenberg 2012-10-11, 16:24
+
Taylor Gautier 2012-10-11, 17:08
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB