Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - Re: partitioning

Stan Rosenberg 2013-01-14, 16:05
Copy link to this message
Re: partitioning
Maxime Brugidou 2013-01-14, 16:37
I'm not sure what design doc you are looking at (v1 probably?, v3 is here:
https://cwiki.apache.org/KAFKA/kafka-detailed-replication-design-v3.html )
but If I understand correctly, consistent hashing for partitioning is more
about remapping as few keys as possible when adding/deleting partitions,
which you can implement already with a custom partitioner by doing
partition_id = abs(num_partitions * hash(key)/hash_space). But the added
value is mitigated by the fact that if you add/delete partitions you
already destroy your partitioning and make it kind of useless?

However if you are talking about partition assignment to brokers that's
another part, and I guess the current state of things is just a simple
round robin to assign partitions on topic creation (to be confirmed?). It
could be interesting to have partitions assigned to brokers with some
consistent hashing so that adding a broker requires moving as few
partitions as possible. That process is done manually as of now using a
ReassignPartition command, and it could be automated with a tool (provided
that you over-partition as Jun recommends, so that you have some
granularity and the load gets spread evenly over brokers).
On Mon, Jan 14, 2013 at 5:04 PM, Stan Rosenberg <[EMAIL PROTECTED]>wrote:

> On Fri, Jan 11, 2013 at 12:37 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > Our current partitioning strategy is to mod key by # of partitions, not #
> > brokers. For better balancing partitions over brokers, one simple
> strategy
> > is to over partition, i.e., you have a few times of more partitions than
> > brokers. That way, if one adds more brokers overtime, you can just move
> > some existing partitions to the new broker.
> >
> > Consistent hashing requires changing # partitions dynamically.However,
> for
> > some applications, they may prefer not to change partitions.
> >
> > What's your use case for consistent hashing?
> >
> My use case is essentially the same as above, i.e., dynamic load balancing.
>  I now understand why the current partitioning strategy is used as opposed
> to consistent hashing; partition "stickiness"
> is definitely to be desired for the sake of moving computation to data.
> However, the dynamic rebalancing as described in "Kafka Replication
> Design", sect. 1.2 looks very similar to what's typically achieved by using
> consistent hashing.
> Is this rebalancing implemented in 0.8 or am I reading the now obsolete
> documentation? :)  (If yes, could you please refer me to the code.)
> Thanks,
> stan