Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> [jira] [Commented] (KAFKA-347) change number of partitions of a topic online

Copy link to this message
[jira] [Commented] (KAFKA-347) change number of partitions of a topic online

    [ https://issues.apache.org/jira/browse/KAFKA-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554062#comment-13554062 ]

Jay Kreps commented on KAFKA-347:

Well i think there are really two mappings here:
key => partition
partition => broker

This is the generalization of consistent hashing that most persistent data systems use.

In Kafka key=>partition is user-defined (Partitioner interface) and defaults to just hash(key)%num_partitions. partition=>broker is assigned at topic creation time and from then on is semi-static (changing it is an admin command). So when adding a broker we already can move just the number of partitions we need by having the tool compute the best set of partitions to migrate or choosing at random.

So the idea is that you over-partition and then the partition count doesn't change and hence the key=>partition assignment doesn't change.

The question is, do we need to support changing the number of partitions to handle the case where you don't over-partition by enough? If you do this then the change in mapping would be large. That could be helped a bit by a consistent hash partitioner for the key=>partition mapping on the client side, but even in that case you would still have lots of values that are now in the wrong partition, so any code that depended on the partitioning would be broken.

Alternately you could do the hard work of actually implementing partition splitting on the broker by having the broker split a partition into two and then migrating the new partitions.

The question I would ask is, is any of this worth it? Many data system don't support partition splitting they just say "choose your partition count wisely or else delete it and start fresh". Arguably most messaging use cases are particularly concerned with recent data so this might be a fine answer. So an alternate strategy would just be to spend the time working on scaling the number of partitions we can handle and over-partitioning heavily.

> change number of partitions of a topic online
> ---------------------------------------------
>                 Key: KAFKA-347
>                 URL: https://issues.apache.org/jira/browse/KAFKA-347
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>              Labels: features
> We will need an admin tool to change the number of partitions of a topic online.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira