What I would like to see is a way for inactive topics to automatically get
removed after they are inactive for a period of time. That might help in
I added a comment to this larger jira:
Perhaps it should really be it's own jira entry.
On Tue, Oct 8, 2013 at 10:29 AM, Aniket Bhatnagar <
[EMAIL PROTECTED]> wrote:
> Thanks Neha. Is it worthwhile to investigate an option to store topic
> metadata (partitions, etc) into another consistent data store (MySQL,
> HBase, etc)? Should we make this feature pluggable?
> The reason I am thinking we may need to go surpass the 2000 total partition
> limit is because there may be genuine use cases to have high number of
> topics. For example, in my particular case, I am using Kafka as a buffer to
> store data arriving from various sensors deployed in physical world. These
> sensors may be short lived or may be long lived. I was thinking of having
> individual topics for each sensor. This ways, if a badly behaving sensor
> attempts to pushes the data at a much faster rate than we can process as a
> Kafka consumer, we will eventually overflow and start losing data for that
> particular sensor. However, we can still potentially continue to process
> data from other sensors that are pushing data at manageable rate. If I go
> with 1 topic for all the sensors, 1 misbehaving sensor can potentially lead
> us not catching up with the topic in the retention period thus making us
> loose data from all sensors.
> The other issue is that if we go with a topic per sensor and the sensors
> are short lived and we have reached a threshold of 2000 sensors already
> deployed, Kafka will stop working (because of Zookeeper limitation) if
> though the previously deployed sensors may not be active at all.
> I am sure there may be other genuine use cases for having topics much
> larger than 2000.
> On 4 October 2013 19:04, Neha Narkhede <[EMAIL PROTECTED]> wrote:
> > You probably want to think of this in terms of number of partitions on a
> > single broker, instead of per topic since I/O is the limiting factor in
> > this case. Another factor to consider is total number of partitions in
> > cluster as Zookeeper becomes a limiting factor there. 30 partitions is
> > too large provided the total number of partitions doesn't exceed roughly
> > couple thousand. To give you an example, some of our clusters are 16
> > big and some of the topics on those clusters have 30 partitions.
> > Thanks,
> > Neha
> > On Oct 4, 2013 4:15 AM, "Aniket Bhatnagar" <[EMAIL PROTECTED]>
> > wrote:
> > > I am using kafka as a buffer for data streaming in from various
> > > Since its a time series data, I generate the key to the message by
> > > combining source ID and minute in the timestamp. This means I can
> > > have 60 partitions per topic (as each source has its own topic). I have
> > > set num.partitions to be 30 (60/2) for each topic in broker config. I
> > don't
> > > have a very good reason to pick 30 as default number of partitions per
> > > topic but I wanted it to be a high number so that I can achieve high
> > > parallelism during in-stream processing. I am worried that having a
> > > number like 30 (default configuration had it as 2), it can negatively
> > > impact kafka performance in terms of message throughput or memory
> > > consumption. I understand that this can lead to many files per
> > > but I am thinking of dealing with it by having multiple directories on
> > the
> > > same disk if at all I run into issues.
> > >
> > > My question to the community is that am I prematurely attempting to
> > > optimizing the partition number as right now even a partition number
> of 5
> > > seems sufficient and hence will run into unwanted issues? Or is 30 an
> > > number to use for number of partitions?
> > >