Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> Random Partitioning Issue

Copy link to this message
Re: Random Partitioning Issue
I agree that minimizing the number of producer connections (while
being a good thing) is really required in very large production
deployments, and the net-effect of the existing change is
counter-intuitive to users who expect an immediate even distribution
across _all_ partitions of the topic.

However, I don't think it is a hack because it is almost exactly the
same behavior as 0.7 in one of its modes. The 0.7 producer (which I
think was even more confusing) had three modes:
i) ZK send
ii) Config send(a): static list of broker1:port1,broker2:port2,etc.
iii) Config send(b): static list of a hardwareVIP:VIPport

(i) and (ii) would achieve even distribution. (iii) would effectively
select one broker and distribute to partitions on that broker within
each reconnect interval. (iii) is very similar to what we now do in
0.8. (Although we stick to one partition during each metadata refresh
interval that can be changed to stick to one broker and distribute
across partitions on that broker).

At the same time, I agree with Joe's suggestion that we should keep
the more intuitive pre-KAFKA-1017 behavior as the default and move the
change in KAFKA-1017 to a more specific partitioner implementation.

On Sun, Sep 15, 2013 at 8:44 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> Let me ask another question which I think is more objective. Let's say 100
> random, smart infrastructure specialists try Kafka, of these 100 how many
> do you believe will
> 1. Say that this behavior is what they expected to happen?
> 2. Be happy with this behavior?
> I am not being facetious I am genuinely looking for a numerical estimate. I
> am trying to figure out if nobody thought about this or if my estimate is
> just really different. For what it is worth my estimate is 0 and 5
> respectively.
> This would be fine expect that we changed it from the good behavior to the
> bad behavior to fix an issue that probably only we have.
> -Jay
> On Sun, Sep 15, 2013 at 8:37 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>> I just took a look at this change. I agree with Joe, not to put to fine a
>> point on it, but this is a confusing hack.
>> Jun, I don't think wanting to minimizing the number of TCP connections is
>> going to be a very common need for people with less than 10k producers. I
>> also don't think people are going to get very good load balancing out of
>> this because most people don't have a ton of producers. I think instead we
>> will spend the next year explaining this behavior which 99% of people will
>> think is a bug (because it is crazy, non-intuitive, and breaks their usage).
>> Why was this done by adding special default behavior in the null key case
>> instead of as a partitioner? The argument that the partitioner interface
>> doesn't have sufficient information to choose a partition is not a good
>> argument for hacking in changes to the default, it is an argument for *
>> improving* the partitioner interface.
>> The whole point of a partitioner interface is to make it possible to plug
>> in non-standard behavior like this, right?
>> -Jay
>> On Sat, Sep 14, 2013 at 8:15 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>>> Joe,
>>> Thanks for bringing this up. I want to clarify this a bit.
>>> 1. Currently, the producer side logic is that if the partitioning key is
>>> not provided (i.e., it is null), the partitioner won't be called. We did
>>> that because we want to select a random and "available" partition to send
>>> messages so that if some partitions are temporarily unavailable (because
>>> of
>>> broker failures), messages can still be sent to other partitions. Doing
>>> this in the partitioner is difficult since the partitioner doesn't know
>>> which partitions are currently available (the DefaultEventHandler does).
>>> 2. As Joel said, the common use case in production is that there are many
>>> more producers than #partitions in a topic. In this case, sticking to a
>>> partition for a few minutes is not going to cause too much imbalance in