Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - implications of using large number of topics....


+
Jason Rosenberg 2012-10-10, 22:57
+
Neha Narkhede 2012-10-10, 23:05
+
Jason Rosenberg 2012-10-10, 23:12
+
Jay Kreps 2012-10-10, 23:25
+
Taylor Gautier 2012-10-11, 03:13
+
Mathias Söderberg 2012-10-11, 13:43
+
Jun Rao 2012-10-12, 05:48
+
Jason Rosenberg 2012-10-12, 17:55
+
Jun Rao 2012-10-14, 03:37
Copy link to this message
-
Re: implications of using large number of topics....
Jason Rosenberg 2012-10-14, 06:42
Cool,

What's the schedule for 0.8 coming out?  Are there any pre-release versions?

Jason

On Sat, Oct 13, 2012 at 8:37 PM, Jun Rao <[EMAIL PROTECTED]> wrote:

> Jason,
>
> The issue with 0.7 is that a topic exists on every broker and every time
> one adds a new broker, some additional partitions for each existing topic
> are added to the new broker. This is going to change in 0.8. A topic has a
> fixed number of partitions, independent of the # of brokers. So, by adding
> more brokers, we can support more topics in a cluster.
>
> Thanks,
>
> Jun
>
> On Fri, Oct 12, 2012 at 10:55 AM, Jason Rosenberg <[EMAIL PROTECTED]>
> wrote:
>
> > Has there ever been a thought to better handle a large number of topics?
> >  Prior discussions?  Or would it likely be too great of a change to the
> way
> > kafka works, no matter what?
> >
> > I'm wondering if there's a way to have a notion of multiple "virtual"
> > topics which are internally managed as members of a single topic "group",
> > but which at the api level, appear to be unique topics, from the client
> > perspective.
> >
> > Naturally, it would be straightforward to implement something like this
> by
> > wrapping the current client apis, but I'm wondering if there's any
> benefit
> > to building it into the internals.  This would still have the downside
> that
> > a client subscribing to a virtual topic would have to, under the covers,
> > sift through lots of messages it's not interested in.
> >
> > Any other interesting approaches?
> >
> > Jason
> >
> >
> > On Thu, Oct 11, 2012 at 10:48 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >
> > > Mathias,
> > >
> > > What matters is the total # partitions since each corresponds to a
> > separate
> > > directory on disk. It doesn't matter how may topics those partitions
> are
> > > from.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Oct 11, 2012 at 6:43 AM, Mathias Söderberg <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Hey all,
> > > >
> > > > This is a quite interesting topic (no pun intended), and I've seen it
> > > come
> > > > up at least once before.
> > > >
> > > > Me and a friend started experimenting with Kafka and ZooKeeper a
> little
> > > > while ago (building a publisher / subscriber system with consistent
> > > hashing
> > > > and whatnot) and currently we're using around 300 topics, all with
> one
> > > > partition each. So far we haven't really done any serious performance
> > > > testing, but I'm planning to do so in the following weeks. But I've
> > got a
> > > > few questions regardless:
> > > >
> > > >
> > > > Does / should it make any difference in performance when one has a
> lot
> > of
> > > > topics compared to having one topic with a lot of partitions? I'm
> > > imagining
> > > > that the system still needs to keep the same number of file
> descriptors
> > > > open, but I'm not sure how this would affect reads and writes? Are we
> > > going
> > > > to run into more random reads and writes by using a lot of topics
> > > compared
> > > > to using one topic with a lot of partitions instead? Can't really
> wrap
> > my
> > > > head around this right now, mostly because of my rather limited
> > knowledge
> > > > about how disks and page caches work.
> > > >
> > > > Could also add that we're mostly doing sequential reads (in rare
> cases
> > we
> > > > have to rewind a topic) and that the number of topics doesn't change.
> > > >
> > > > On 11 October 2012 05:13, Taylor Gautier <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > We used pattern #1 at Tagged.  I wouldn't recommend it unless
> you're
> > > > really
> > > > > committed.  It took a lot of work to get it working right.
> > > > >
> > > > > a) Performance degraded non-linearly (read it fell off a cliff)
> when
> > > > > brokers were managing more than about 20k topics.  This was on a
> > Linux
> > > > RHEL
> > > > > 5.3 system with EXT3.  YMMV.
> > > > >
> > > > > b) Startup time is significantly longer for a broker that is
> > restarted
> > > > due
> > > > > to communication with ZK to sync up on those topics.
+
Jun Rao 2012-10-15, 18:42
+
Jason Rosenberg 2012-10-11, 16:24
+
Taylor Gautier 2012-10-11, 17:08