Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Managing Millions of Paritions in Kafka

Copy link to this message
Re: Managing Millions of Paritions in Kafka
Neha Narkhede 2013-10-06, 08:00
Kafka is designed to have of the order of few thousands of partitions
roughly less than 10,000. And the main bottleneck is zookeeper. A better
way to design such a system is to have fewer partitions and use keyed
messages to distribute the data over a fixed set of partitions.

On Oct 5, 2013 8:19 PM, "Ravindranath Akila" <[EMAIL PROTECTED]>

> Initially, I thought dynamic topic creation can be used to maintain per
> user data on Kafka. The I read that partitions can and should be used for
> this instead.
> If a partition is to be used to map a user, can there be a million, or even
> billion partitions in a cluster? How does one go about designing such a
> model.
> Can the replication tool be used to assign, say partitions 1 - 10,000 on
> replica 1, and 10,001 - 20,000 on replica 2?
> If not, since there is a ulimit on the file system, should one model it
> based on a replica/topic/partition approach. Say users 1-10,000 go on topic
> 10k-1, and has 10,000 partitions, and users 10,0001-20,000 go on topic
> 10k-2, and has 10,000 partitions.
> Simply put, how can a million stateful data points be handled? (I deduced
> that a userid-partition number mapping can be done via a partitioner, but
> unless a server can be configured to handle only a given set of partitions,
> with a range based notation, it is almost impossible to handle a large
> dataset. Is it that Kafka can only handle a limited set of stateful data
> right now?)
> http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions
> Btw, why does Kafka have to keep open each partition? Can't a partition be
> opened for read/write when needed only?
> Thanks in advance!