Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> implications of using large number of topics....


Copy link to this message
-
Re: implications of using large number of topics....
Ok,

Perhaps for the sake of argument, consider the question if we have just 1
kafka broker.  It sounds like it will need to keep a file handle open for
each topic?  Is that right?

Jason

On Wed, Oct 10, 2012 at 4:05 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> Hi Jason,
>
> We use option #2 at LinkedIn for metrics and tracking data. Supporting
> Option #1 in Kafka 0.7 has its challenges since every topic is stored
> on every broker, by design. Hence, the number of topics a cluster can
> support is limited by the IO and number of open file handles on each
> broker. After Kafka 0.8 is released, the distribution of topics to
> brokers is user defined and can scale out with the number of brokers.
> Having said that, some Kafka users have successfully deployed Kafka
> 0.7 clusters hosting very high number of topics. I hope they can share
> their experiences here.
>
> Thanks,
> Neha
>
> On Wed, Oct 10, 2012 at 3:57 PM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I'm exploring using kafka for the first time.
> >
> > I'm contemplating a system where we transmit metric data at regular
> > intervals to kafka.  One question I have is whether to generate simple
> > messages with very little meta data (just timestamp and value), and
> keeping
> > meta data like the name/host/app that generated metric out of the
> message,
> > and have that be embodied in the name of the topic itself instead.
> >  Alternatively, we could have a relatively small number of topics, which
> > contain messages which include source meta data along with the timestamp
> > and metric value in each message.
> >
> > 1. On one hand, we'd have a large number of topics (say several hundred
> > thousand topics) with small messages, generated at a steady rate (say one
> > every 10 seconds).
> >
> > 2. Alternatively, we could have just few topics, which receive several
> > hundred thousand messages every 10 seconds, which contain 2 or 3 times
> more
> > data per message.
> >
> > I'm wondering if kafka has any performance characteristics that differ
> for
> > the 2 scenarios.
> >
> > I like #1 because it simplifies targeted message consumption, and enables
> > more interesting use of TopicFilter'ing.  But I'm unsure whether there
> > might be performance concerns with kafka (does it have to do more work to
> > separately manage each topic?).  Is this a common use case, or not?
> >
> > Thanks for any insight.
> >
> > Jason
>