Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> Random Partitioning Issue

Joe Stein 2013-09-14, 05:11
Joel Koshy 2013-09-14, 12:17
Joe Stein 2013-09-14, 18:19
Jun Rao 2013-09-15, 03:15
Copy link to this message
Re: Random Partitioning Issue
I just took a look at this change. I agree with Joe, not to put to fine a
point on it, but this is a confusing hack.

Jun, I don't think wanting to minimizing the number of TCP connections is
going to be a very common need for people with less than 10k producers. I
also don't think people are going to get very good load balancing out of
this because most people don't have a ton of producers. I think instead we
will spend the next year explaining this behavior which 99% of people will
think is a bug (because it is crazy, non-intuitive, and breaks their usage).

Why was this done by adding special default behavior in the null key case
instead of as a partitioner? The argument that the partitioner interface
doesn't have sufficient information to choose a partition is not a good
argument for hacking in changes to the default, it is an argument for *
improving* the partitioner interface.

The whole point of a partitioner interface is to make it possible to plug
in non-standard behavior like this, right?

On Sat, Sep 14, 2013 at 8:15 PM, Jun Rao <[EMAIL PROTECTED]> wrote:

> Joe,
> Thanks for bringing this up. I want to clarify this a bit.
> 1. Currently, the producer side logic is that if the partitioning key is
> not provided (i.e., it is null), the partitioner won't be called. We did
> that because we want to select a random and "available" partition to send
> messages so that if some partitions are temporarily unavailable (because of
> broker failures), messages can still be sent to other partitions. Doing
> this in the partitioner is difficult since the partitioner doesn't know
> which partitions are currently available (the DefaultEventHandler does).
> 2. As Joel said, the common use case in production is that there are many
> more producers than #partitions in a topic. In this case, sticking to a
> partition for a few minutes is not going to cause too much imbalance in the
> partitions and has the benefit of reducing the # of socket connections. My
> feeling is that this will benefit most production users. In fact, if one
> uses a hardware load balancer for producing data in 0.7, it behaves in
> exactly the same way (a producer will stick to a broker until the reconnect
> interval is reached).
> 3. It is true that If one is testing a topic with more than one partition
> (which is not the default value), this behavior can be a bit weird.
> However, I think it can be mitigated by running multiple test producer
> instances.
> 4. Someone reported in the mailing list that all data shows in only one
> partition after a few weeks. This is clearly not the expected behavior. We
> can take a closer look to see if this is real issue.
> Do you think these address your concerns?
> Thanks,
> Jun
> On Sat, Sep 14, 2013 at 11:18 AM, Joe Stein <[EMAIL PROTECTED]> wrote:
> > How about creating a new class called RandomRefreshPartioner and copy the
> > DefaultPartitioner code to it and then revert the DefaultPartitioner
> code.
> >  I appreciate this is a one time burden for folks using the existing
> > 0.8-beta1 bumping into KAFKA-1017 in production having to switch to the
> > RandomRefreshPartioner and when folks deploy to production will have to
> > consider this property change.
> >
> > I make this suggestion keeping in mind the new folks that on board with
> > Kafka and when everyone is in development and testing mode for the first
> > time their experience would be as expected from how it would work in
> > production this way.  In dev/test when first using Kafka they won't have
> so
> > many producers for partitions but would look to parallelize their
> consumers
> > IMHO.
> >
> > The random broker change sounds like maybe a bigger change now this late
> > in the release cycle if we can accommodate folks trying Kafka for the
> first
> > time and through their development and testing along with full blown
> > production deploys.
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant

Jay Kreps 2013-09-15, 15:45
Joel Koshy 2013-09-17, 17:19
Jay Kreps 2013-09-17, 17:41
Joe Stein 2013-09-18, 23:23
Jun Rao 2013-09-23, 00:57
Joe Stein 2013-09-27, 16:24
Jun Rao 2013-09-27, 16:46
Joe Stein 2013-09-27, 16:54
Jun Rao 2013-09-27, 17:10
Joe Stein 2013-09-27, 17:31
Jun Rao 2013-09-28, 04:12
Guozhang Wang 2013-09-29, 04:52
Jun Rao 2013-09-29, 16:15
Joe Stein 2013-10-01, 05:22
Jun Rao 2013-10-01, 15:27
Joe Stein 2013-10-01, 15:35
Jun Rao 2013-10-01, 16:32