Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> implications of using large number of topics....


Copy link to this message
-
Re: implications of using large number of topics....
Cool,

What's the schedule for 0.8 coming out?  Are there any pre-release versions?

Jason

On Sat, Oct 13, 2012 at 8:37 PM, Jun Rao <[EMAIL PROTECTED]> wrote:

> Jason,
>
> The issue with 0.7 is that a topic exists on every broker and every time
> one adds a new broker, some additional partitions for each existing topic
> are added to the new broker. This is going to change in 0.8. A topic has a
> fixed number of partitions, independent of the # of brokers. So, by adding
> more brokers, we can support more topics in a cluster.
>
> Thanks,
>
> Jun
>
> On Fri, Oct 12, 2012 at 10:55 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
>
> > Has there ever been a thought to better handle a large number of topics?
> >  Prior discussions?  Or would it likely be too great of a change to the
> way
> > kafka works, no matter what?
> >
> > I'm wondering if there's a way to have a notion of multiple "virtual"
> > topics which are internally managed as members of a single topic "group",
> > but which at the api level, appear to be unique topics, from the client
> > perspective.
> >
> > Naturally, it would be straightforward to implement something like this
> by
> > wrapping the current client apis, but I'm wondering if there's any
> benefit
> > to building it into the internals.  This would still have the downside
> that
> > a client subscribing to a virtual topic would have to, under the covers,
> > sift through lots of messages it's not interested in.
> >
> > Any other interesting approaches?
> >
> > Jason
> >
> >
> > On Thu, Oct 11, 2012 at 10:48 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >
> > > Mathias,
> > >
> > > What matters is the total # partitions since each corresponds to a
> > separate
> > > directory on disk. It doesn't matter how may topics those partitions
> are
> > > from.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Oct 11, 2012 at 6:43 AM, Mathias Söderberg <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Hey all,
> > > >
> > > > This is a quite interesting topic (no pun intended), and I've seen it
> > > come
> > > > up at least once before.
> > > >
> > > > Me and a friend started experimenting with Kafka and ZooKeeper a
> little
> > > > while ago (building a publisher / subscriber system with consistent
> > > hashing
> > > > and whatnot) and currently we're using around 300 topics, all with
> one
> > > > partition each. So far we haven't really done any serious performance
> > > > testing, but I'm planning to do so in the following weeks. But I've
> > got a
> > > > few questions regardless:
> > > >
> > > >
> > > > Does / should it make any difference in performance when one has a
> lot
> > of
> > > > topics compared to having one topic with a lot of partitions? I'm
> > > imagining
> > > > that the system still needs to keep the same number of file
> descriptors
> > > > open, but I'm not sure how this would affect reads and writes? Are we
> > > going
> > > > to run into more random reads and writes by using a lot of topics
> > > compared
> > > > to using one topic with a lot of partitions instead? Can't really
> wrap
> > my
> > > > head around this right now, mostly because of my rather limited
> > knowledge
> > > > about how disks and page caches work.
> > > >
> > > > Could also add that we're mostly doing sequential reads (in rare
> cases
> > we
> > > > have to rewind a topic) and that the number of topics doesn't change.
> > > >
> > > > On 11 October 2012 05:13, Taylor Gautier <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > We used pattern #1 at Tagged.  I wouldn't recommend it unless
> you're
> > > > really
> > > > > committed.  It took a lot of work to get it working right.
> > > > >
> > > > > a) Performance degraded non-linearly (read it fell off a cliff)
> when
> > > > > brokers were managing more than about 20k topics.  This was on a
> > Linux
> > > > RHEL
> > > > > 5.3 system with EXT3.  YMMV.
> > > > >
> > > > > b) Startup time is significantly longer for a broker that is
> > restarted
> > > > due
> > > > > to communication with ZK to sync up on those topics.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB