Kafka, mail # user - Is 30 a too high partition number? - 2013-10-04, 09:15
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
Is 30 a too high partition number?
I am using kafka as a buffer for data streaming in from various sources.
Since its a time series data, I generate the key to the message by
combining source ID and minute in the timestamp. This means I can utmost
have 60 partitions per topic (as each source has its own topic). I have
set num.partitions to be 30 (60/2) for each topic in broker config. I don't
have a very good reason to pick 30 as default number of partitions per
topic but I wanted it to be a high number so that I can achieve high
parallelism during in-stream processing. I am worried that having a high
number  like 30 (default configuration had it as 2), it can negatively
impact kafka performance in terms of message throughput or memory
consumption. I understand that this can lead to many files per partition
but I am thinking of dealing with it by having multiple directories on the
same disk if at all I run into issues.

My question to the community is that am I prematurely attempting to
optimizing the partition number as right now even a partition number of 5
seems sufficient and hence will run into unwanted issues? Or is 30 an Ok
number to use for number of partitions?

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB