Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> partitioning


Copy link to this message
-
Re: partitioning
I'm not sure what design doc you are looking at (v1 probably?, v3 is here:
https://cwiki.apache.org/KAFKA/kafka-detailed-replication-design-v3.html )
but If I understand correctly, consistent hashing for partitioning is more
about remapping as few keys as possible when adding/deleting partitions,
which you can implement already with a custom partitioner by doing
partition_id = abs(num_partitions * hash(key)/hash_space). But the added
value is mitigated by the fact that if you add/delete partitions you
already destroy your partitioning and make it kind of useless?

However if you are talking about partition assignment to brokers that's
another part, and I guess the current state of things is just a simple
round robin to assign partitions on topic creation (to be confirmed?). It
could be interesting to have partitions assigned to brokers with some
consistent hashing so that adding a broker requires moving as few
partitions as possible. That process is done manually as of now using a
ReassignPartition command, and it could be automated with a tool (provided
that you over-partition as Jun recommends, so that you have some
granularity and the load gets spread evenly over brokers).
On Mon, Jan 14, 2013 at 5:04 PM, Stan Rosenberg <[EMAIL PROTECTED]>wrote:

> On Fri, Jan 11, 2013 at 12:37 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > Our current partitioning strategy is to mod key by # of partitions, not #
> > brokers. For better balancing partitions over brokers, one simple
> strategy
> > is to over partition, i.e., you have a few times of more partitions than
> > brokers. That way, if one adds more brokers overtime, you can just move
> > some existing partitions to the new broker.
> >
> > Consistent hashing requires changing # partitions dynamically.However,
> for
> > some applications, they may prefer not to change partitions.
> >
>
> > What's your use case for consistent hashing?
> >
>
> My use case is essentially the same as above, i.e., dynamic load balancing.
>  I now understand why the current partitioning strategy is used as opposed
> to consistent hashing; partition "stickiness"
> is definitely to be desired for the sake of moving computation to data.
> However, the dynamic rebalancing as described in "Kafka Replication
> Design", sect. 1.2 looks very similar to what's typically achieved by using
> consistent hashing.
> Is this rebalancing implemented in 0.8 or am I reading the now obsolete
> documentation? :)  (If yes, could you please refer me to the code.)
>
> Thanks,
>
> stan
>

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB