Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Managing Millions of Paritions in Kafka


Copy link to this message
-
Re: Managing Millions of Paritions in Kafka
Kafka is designed to have of the order of few thousands of partitions
roughly less than 10,000. And the main bottleneck is zookeeper. A better
way to design such a system is to have fewer partitions and use keyed
messages to distribute the data over a fixed set of partitions.

Thanks,
Neha
On Oct 5, 2013 8:19 PM, "Ravindranath Akila" <[EMAIL PROTECTED]>
wrote:

> Initially, I thought dynamic topic creation can be used to maintain per
> user data on Kafka. The I read that partitions can and should be used for
> this instead.
>
> If a partition is to be used to map a user, can there be a million, or even
> billion partitions in a cluster? How does one go about designing such a
> model.
>
> Can the replication tool be used to assign, say partitions 1 - 10,000 on
> replica 1, and 10,001 - 20,000 on replica 2?
>
> If not, since there is a ulimit on the file system, should one model it
> based on a replica/topic/partition approach. Say users 1-10,000 go on topic
> 10k-1, and has 10,000 partitions, and users 10,0001-20,000 go on topic
> 10k-2, and has 10,000 partitions.
>
> Simply put, how can a million stateful data points be handled? (I deduced
> that a userid-partition number mapping can be done via a partitioner, but
> unless a server can be configured to handle only a given set of partitions,
> with a range based notation, it is almost impossible to handle a large
> dataset. Is it that Kafka can only handle a limited set of stateful data
> right now?)
>
>
> http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions
>
> Btw, why does Kafka have to keep open each partition? Can't a partition be
> opened for read/write when needed only?
>
> Thanks in advance!
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB