Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Is 30 a too high partition number?


Copy link to this message
-
Is 30 a too high partition number?
I am using kafka as a buffer for data streaming in from various sources.
Since its a time series data, I generate the key to the message by
combining source ID and minute in the timestamp. This means I can utmost
have 60 partitions per topic (as each source has its own topic). I have
set num.partitions to be 30 (60/2) for each topic in broker config. I don't
have a very good reason to pick 30 as default number of partitions per
topic but I wanted it to be a high number so that I can achieve high
parallelism during in-stream processing. I am worried that having a high
number  like 30 (default configuration had it as 2), it can negatively
impact kafka performance in terms of message throughput or memory
consumption. I understand that this can lead to many files per partition
but I am thinking of dealing with it by having multiple directories on the
same disk if at all I run into issues.

My question to the community is that am I prematurely attempting to
optimizing the partition number as right now even a partition number of 5
seems sufficient and hence will run into unwanted issues? Or is 30 an Ok
number to use for number of partitions?

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB