Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Partitioning and scale


Copy link to this message
-
Re: Partitioning and scale
Neha Narkhede 2013-05-24, 15:40
Timothy,

Kafka is not designed to support millions of topics. Zookeeper will become
a bottleneck, even if you deploy more brokers to get around the # of files
issue. In normal cases, it might work just fine with the right sized
cluster. However, when there are failures, the time to recovery could be a
few minutes instead of a few 100s of ms or few seconds.

Also, even if you go with one topic per session id approach, it will be
unscalable when the key space increases in the future. A more scalable
approach is what Milind described. Use the sticky partitioning feature in
08 and have a session id always get routed to a particular partition. Then
each of your consumers can be sure that they will always receive data for a
subset of the session ids. So you can do any locality sensitive processing
on your consumers for processing user sessions.

Thanks,
Neha
On Thu, May 23, 2013 at 4:36 PM, Milind Parikh <[EMAIL PROTECTED]>wrote:

> Number of files to manage by os, I suppose.
>
> Why wouldn't you use consistent hashing with deliberately engineered
> collisions to generate a limited number of topics / partitions and filter
> at the consumer level?
>
> Regards
> Milind
> On May 23, 2013 4:22 PM, "Timothy Chen" <[EMAIL PROTECTED]> wrote:
>
> > Hi Neha,
> >
> > Not sure if this sounds crazy, but if we'd like to have the events for
> the
> > same session id go to the same partition one way could be that each
> session
> > key creates its own topic with single partition, therefore there could be
> > millions of topic with single partition.
> >
> > I wonder what would be the bottleneck of doing this?
> >
> > Thanks,
> >
> > Tim
> >
> >
> > On Wed, May 22, 2013 at 4:32 PM, Neha Narkhede <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Not automatically as of today. You have to run the reassign-partitions
> > tool
> > > and explicitly move selected partitions to the new brokers. If you use
> > this
> > > tool, you can move partitions to the new broker without any downtime.
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Wed, May 22, 2013 at 2:20 PM, Timothy Chen <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > Hi Neha/Chris,
> > > >
> > > > Thanks for the reply, so if I set a fixed number of partitions and
> just
> > > add
> > > > brokers to the broker pool, does it rebalance the load to the new
> > brokers
> > > > (along with the data)?
> > > >
> > > > Tim
> > > >
> > > >
> > > > On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > - I see that Kafka server.properties allows one to specify the
> number
> > > of
> > > > > partitions it supports. However, when we want to scale I wonder if
> we
> > > > add #
> > > > > of partitions or # of brokers, will the same partitioner start
> > > > distributing
> > > > > the messages to different partitions?
> > > > >  And if it does, how can that same consumer continue to read off
> the
> > > > > messages of those ids if it was interrupted in the middle?
> > > > >
> > > > > The num.partitions config in server.properties is used only for
> > topics
> > > > that
> > > > > are auto created (controlled by auto.create.topics.enable). For
> > topics
> > > > that
> > > > > you create using the admin tool, you can specify the number of
> > > partitions
> > > > > that you want. After that, currently there is no way to change
> that.
> > > For
> > > > > that reason, it is a good idea to over partition your topic, which
> > also
> > > > > helps load balance partitions onto the brokers. You are right that
> if
> > > you
> > > > > change the number of partitions later, then previously messages
> that
> > > > stuck
> > > > > to a certain partition would now get routed to a different
> partition,
> > > > which
> > > > > is undesirable for applications that want to use sticky
> partitioning.
> > > > >
> > > > > - I'd like to create a consumer per partition, and for each one to
> > > > > subscribe to the changes of that one. How can this be done in
> kafka?
> > > > >