Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> Random Partitioning Issue


Copy link to this message
-
Re: Random Partitioning Issue
I just took a look at this change. I agree with Joe, not to put to fine a
point on it, but this is a confusing hack.

Jun, I don't think wanting to minimizing the number of TCP connections is
going to be a very common need for people with less than 10k producers. I
also don't think people are going to get very good load balancing out of
this because most people don't have a ton of producers. I think instead we
will spend the next year explaining this behavior which 99% of people will
think is a bug (because it is crazy, non-intuitive, and breaks their usage).

Why was this done by adding special default behavior in the null key case
instead of as a partitioner? The argument that the partitioner interface
doesn't have sufficient information to choose a partition is not a good
argument for hacking in changes to the default, it is an argument for *
improving* the partitioner interface.

The whole point of a partitioner interface is to make it possible to plug
in non-standard behavior like this, right?

-Jay
On Sat, Sep 14, 2013 at 8:15 PM, Jun Rao <[EMAIL PROTECTED]> wrote:

> Joe,
>
> Thanks for bringing this up. I want to clarify this a bit.
>
> 1. Currently, the producer side logic is that if the partitioning key is
> not provided (i.e., it is null), the partitioner won't be called. We did
> that because we want to select a random and "available" partition to send
> messages so that if some partitions are temporarily unavailable (because of
> broker failures), messages can still be sent to other partitions. Doing
> this in the partitioner is difficult since the partitioner doesn't know
> which partitions are currently available (the DefaultEventHandler does).
>
> 2. As Joel said, the common use case in production is that there are many
> more producers than #partitions in a topic. In this case, sticking to a
> partition for a few minutes is not going to cause too much imbalance in the
> partitions and has the benefit of reducing the # of socket connections. My
> feeling is that this will benefit most production users. In fact, if one
> uses a hardware load balancer for producing data in 0.7, it behaves in
> exactly the same way (a producer will stick to a broker until the reconnect
> interval is reached).
>
> 3. It is true that If one is testing a topic with more than one partition
> (which is not the default value), this behavior can be a bit weird.
> However, I think it can be mitigated by running multiple test producer
> instances.
>
> 4. Someone reported in the mailing list that all data shows in only one
> partition after a few weeks. This is clearly not the expected behavior. We
> can take a closer look to see if this is real issue.
>
> Do you think these address your concerns?
>
> Thanks,
>
> Jun
>
>
>
> On Sat, Sep 14, 2013 at 11:18 AM, Joe Stein <[EMAIL PROTECTED]> wrote:
>
> > How about creating a new class called RandomRefreshPartioner and copy the
> > DefaultPartitioner code to it and then revert the DefaultPartitioner
> code.
> >  I appreciate this is a one time burden for folks using the existing
> > 0.8-beta1 bumping into KAFKA-1017 in production having to switch to the
> > RandomRefreshPartioner and when folks deploy to production will have to
> > consider this property change.
> >
> > I make this suggestion keeping in mind the new folks that on board with
> > Kafka and when everyone is in development and testing mode for the first
> > time their experience would be as expected from how it would work in
> > production this way.  In dev/test when first using Kafka they won't have
> so
> > many producers for partitions but would look to parallelize their
> consumers
> > IMHO.
> >
> > The random broker change sounds like maybe a bigger change now this late
> > in the release cycle if we can accommodate folks trying Kafka for the
> first
> > time and through their development and testing along with full blown
> > production deploys.
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB