Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Partitioning and scale


Copy link to this message
-
Re: Partitioning and scale
Hi Neha,

Not sure if this sounds crazy, but if we'd like to have the events for the
same session id go to the same partition one way could be that each session
key creates its own topic with single partition, therefore there could be
millions of topic with single partition.

I wonder what would be the bottleneck of doing this?

Thanks,

Tim
On Wed, May 22, 2013 at 4:32 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> Not automatically as of today. You have to run the reassign-partitions tool
> and explicitly move selected partitions to the new brokers. If you use this
> tool, you can move partitions to the new broker without any downtime.
>
> Thanks,
> Neha
>
>
> On Wed, May 22, 2013 at 2:20 PM, Timothy Chen <[EMAIL PROTECTED]> wrote:
>
> > Hi Neha/Chris,
> >
> > Thanks for the reply, so if I set a fixed number of partitions and just
> add
> > brokers to the broker pool, does it rebalance the load to the new brokers
> > (along with the data)?
> >
> > Tim
> >
> >
> > On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede <[EMAIL PROTECTED]
> > >wrote:
> >
> > > - I see that Kafka server.properties allows one to specify the number
> of
> > > partitions it supports. However, when we want to scale I wonder if we
> > add #
> > > of partitions or # of brokers, will the same partitioner start
> > distributing
> > > the messages to different partitions?
> > >  And if it does, how can that same consumer continue to read off the
> > > messages of those ids if it was interrupted in the middle?
> > >
> > > The num.partitions config in server.properties is used only for topics
> > that
> > > are auto created (controlled by auto.create.topics.enable). For topics
> > that
> > > you create using the admin tool, you can specify the number of
> partitions
> > > that you want. After that, currently there is no way to change that.
> For
> > > that reason, it is a good idea to over partition your topic, which also
> > > helps load balance partitions onto the brokers. You are right that if
> you
> > > change the number of partitions later, then previously messages that
> > stuck
> > > to a certain partition would now get routed to a different partition,
> > which
> > > is undesirable for applications that want to use sticky partitioning.
> > >
> > > - I'd like to create a consumer per partition, and for each one to
> > > subscribe to the changes of that one. How can this be done in kafka?
> > >
> > > For your use case, it seems like SimpleConsumer might be a better fit.
> > > However, it will require you to write code to handle discovery of
> leader
> > > for the partition that your consumer is consuming. Chris has written
> up a
> > > great example that you can follow -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Wed, May 22, 2013 at 12:37 PM, Chris Curtin <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi Tim,
> > > >
> > > >
> > > > On Wed, May 22, 2013 at 3:25 PM, Timothy Chen <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm currently trying to understand how Kafka (0.8) can scale with
> our
> > > > usage
> > > > > pattern and how to setup the partitioning.
> > > > >
> > > > > We want to route the same messages belonging to the same id to the
> > same
> > > > > queue, so its consumer will able to consume all the messages of
> that
> > > id.
> > > > >
> > > > > My questions:
> > > > >
> > > > >  - From my understanding, in Kafka we would need to have a custom
> > > > > partitioner that routes the same messages to the same partition
> > right?
> > > >  I'm
> > > > > trying to find examples of writing this partitioner logic, but I
> > can't
> > > > find
> > > > > any. Can someone point me to an example?
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
> > > >
> > > > The partitioner here does a simple mod on the IP address and the # of
> > > > partitions. You'd need to define your own logic, but this is a start.

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB