Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> Random Partitioning Issue


+
Joe Stein 2013-09-14, 05:11
+
Joel Koshy 2013-09-14, 12:17
+
Joe Stein 2013-09-14, 18:19
+
Jun Rao 2013-09-15, 03:15
+
Jay Kreps 2013-09-15, 15:37
+
Jay Kreps 2013-09-15, 15:45
+
Joel Koshy 2013-09-17, 17:19
+
Jay Kreps 2013-09-17, 17:41
Copy link to this message
-
Re: Random Partitioning Issue
Sounds good, I will create a JIRA and upload a patch.
/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
********************************************/
On Sep 17, 2013, at 1:19 PM, Joel Koshy <[EMAIL PROTECTED]> wrote:

> I agree that minimizing the number of producer connections (while
> being a good thing) is really required in very large production
> deployments, and the net-effect of the existing change is
> counter-intuitive to users who expect an immediate even distribution
> across _all_ partitions of the topic.
>
> However, I don't think it is a hack because it is almost exactly the
> same behavior as 0.7 in one of its modes. The 0.7 producer (which I
> think was even more confusing) had three modes:
> i) ZK send
> ii) Config send(a): static list of broker1:port1,broker2:port2,etc.
> iii) Config send(b): static list of a hardwareVIP:VIPport
>
> (i) and (ii) would achieve even distribution. (iii) would effectively
> select one broker and distribute to partitions on that broker within
> each reconnect interval. (iii) is very similar to what we now do in
> 0.8. (Although we stick to one partition during each metadata refresh
> interval that can be changed to stick to one broker and distribute
> across partitions on that broker).
>
> At the same time, I agree with Joe's suggestion that we should keep
> the more intuitive pre-KAFKA-1017 behavior as the default and move the
> change in KAFKA-1017 to a more specific partitioner implementation.
>
> Joel
>
>
> On Sun, Sep 15, 2013 at 8:44 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>> Let me ask another question which I think is more objective. Let's say 100
>> random, smart infrastructure specialists try Kafka, of these 100 how many
>> do you believe will
>> 1. Say that this behavior is what they expected to happen?
>> 2. Be happy with this behavior?
>> I am not being facetious I am genuinely looking for a numerical estimate. I
>> am trying to figure out if nobody thought about this or if my estimate is
>> just really different. For what it is worth my estimate is 0 and 5
>> respectively.
>>
>> This would be fine expect that we changed it from the good behavior to the
>> bad behavior to fix an issue that probably only we have.
>>
>> -Jay
>>
>>
>> On Sun, Sep 15, 2013 at 8:37 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>>
>>> I just took a look at this change. I agree with Joe, not to put to fine a
>>> point on it, but this is a confusing hack.
>>>
>>> Jun, I don't think wanting to minimizing the number of TCP connections is
>>> going to be a very common need for people with less than 10k producers. I
>>> also don't think people are going to get very good load balancing out of
>>> this because most people don't have a ton of producers. I think instead we
>>> will spend the next year explaining this behavior which 99% of people will
>>> think is a bug (because it is crazy, non-intuitive, and breaks their usage).
>>>
>>> Why was this done by adding special default behavior in the null key case
>>> instead of as a partitioner? The argument that the partitioner interface
>>> doesn't have sufficient information to choose a partition is not a good
>>> argument for hacking in changes to the default, it is an argument for *
>>> improving* the partitioner interface.
>>>
>>> The whole point of a partitioner interface is to make it possible to plug
>>> in non-standard behavior like this, right?
>>>
>>> -Jay
>>>
>>>
>>> On Sat, Sep 14, 2013 at 8:15 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>>>
>>>> Joe,
>>>>
>>>> Thanks for bringing this up. I want to clarify this a bit.
>>>>
>>>> 1. Currently, the producer side logic is that if the partitioning key is
>>>> not provided (i.e., it is null), the partitioner won't be called. We did
>>>> that because we want to select a random and "available" partition to send
>>>> messages so that if some partitions are temporarily unavailable (because

 
+
Jun Rao 2013-09-23, 00:57
+
Joe Stein 2013-09-27, 16:24
+
Jun Rao 2013-09-27, 16:46
+
Joe Stein 2013-09-27, 16:54
+
Jun Rao 2013-09-27, 17:10
+
Joe Stein 2013-09-27, 17:31
+
Jun Rao 2013-09-28, 04:12
+
Guozhang Wang 2013-09-29, 04:52
+
Jun Rao 2013-09-29, 16:15
+
Joe Stein 2013-10-01, 05:22
+
Jun Rao 2013-10-01, 15:27
+
Joe Stein 2013-10-01, 15:35
+
Jun Rao 2013-10-01, 16:32