Kafka, mail # user - Re: Is 30 a too high partition number? - 2013-10-08, 14:29
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
Re: Is 30 a too high partition number?
Thanks Neha. Is it worthwhile to investigate an option to store topic
metadata (partitions, etc) into another consistent data store (MySQL,
HBase, etc)? Should we make this feature pluggable?

The reason I am thinking we may need to go surpass the 2000 total partition
limit is because there may be genuine use cases to have high number of
topics. For example, in my particular case, I am using Kafka as a buffer to
store data arriving from various sensors deployed in physical world. These
sensors may be short lived or may be long lived. I was thinking of having
individual topics for each sensor. This ways, if a badly behaving sensor
attempts to pushes the data at a much faster rate than we can process as a
Kafka consumer, we will eventually overflow and start losing data for that
particular sensor. However, we can still potentially continue to process
data from other sensors that are pushing data at manageable rate. If I go
with 1 topic for all the sensors, 1 misbehaving sensor can potentially lead
us not catching up with the topic in the retention period thus making us
loose data from all sensors.

The other issue is that if we go with a topic per sensor and the sensors
are short lived and we have reached a threshold of 2000 sensors already
deployed, Kafka will stop working (because of Zookeeper limitation) if
though the previously deployed sensors may not be active at all.

I am sure there may be other genuine use cases for having topics much
larger than 2000.
On 4 October 2013 19:04, Neha Narkhede <[EMAIL PROTECTED]> wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB