Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> Random Partitioning Issue


Copy link to this message
-
Re: Random Partitioning Issue
Sounds good, I will create a JIRA and upload a patch.
/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop
********************************************/
On Sep 17, 2013, at 1:19 PM, Joel Koshy <[EMAIL PROTECTED]> wrote:

> I agree that minimizing the number of producer connections (while
> being a good thing) is really required in very large production
> deployments, and the net-effect of the existing change is
> counter-intuitive to users who expect an immediate even distribution
> across _all_ partitions of the topic.
>
> However, I don't think it is a hack because it is almost exactly the
> same behavior as 0.7 in one of its modes. The 0.7 producer (which I
> think was even more confusing) had three modes:
> i) ZK send
> ii) Config send(a): static list of broker1:port1,broker2:port2,etc.
> iii) Config send(b): static list of a hardwareVIP:VIPport
>
> (i) and (ii) would achieve even distribution. (iii) would effectively
> select one broker and distribute to partitions on that broker within
> each reconnect interval. (iii) is very similar to what we now do in
> 0.8. (Although we stick to one partition during each metadata refresh
> interval that can be changed to stick to one broker and distribute
> across partitions on that broker).
>
> At the same time, I agree with Joe's suggestion that we should keep
> the more intuitive pre-KAFKA-1017 behavior as the default and move the
> change in KAFKA-1017 to a more specific partitioner implementation.
>
> Joel
>
>
> On Sun, Sep 15, 2013 at 8:44 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>> Let me ask another question which I think is more objective. Let's say 100
>> random, smart infrastructure specialists try Kafka, of these 100 how many
>> do you believe will
>> 1. Say that this behavior is what they expected to happen?
>> 2. Be happy with this behavior?
>> I am not being facetious I am genuinely looking for a numerical estimate. I
>> am trying to figure out if nobody thought about this or if my estimate is
>> just really different. For what it is worth my estimate is 0 and 5
>> respectively.
>>
>> This would be fine expect that we changed it from the good behavior to the
>> bad behavior to fix an issue that probably only we have.
>>
>> -Jay
>>
>>
>> On Sun, Sep 15, 2013 at 8:37 AM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>>
>>> I just took a look at this change. I agree with Joe, not to put to fine a
>>> point on it, but this is a confusing hack.
>>>
>>> Jun, I don't think wanting to minimizing the number of TCP connections is
>>> going to be a very common need for people with less than 10k producers. I
>>> also don't think people are going to get very good load balancing out of
>>> this because most people don't have a ton of producers. I think instead we
>>> will spend the next year explaining this behavior which 99% of people will
>>> think is a bug (because it is crazy, non-intuitive, and breaks their usage).
>>>
>>> Why was this done by adding special default behavior in the null key case
>>> instead of as a partitioner? The argument that the partitioner interface
>>> doesn't have sufficient information to choose a partition is not a good
>>> argument for hacking in changes to the default, it is an argument for *
>>> improving* the partitioner interface.
>>>
>>> The whole point of a partitioner interface is to make it possible to plug
>>> in non-standard behavior like this, right?
>>>
>>> -Jay
>>>
>>>
>>> On Sat, Sep 14, 2013 at 8:15 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>>>
>>>> Joe,
>>>>
>>>> Thanks for bringing this up. I want to clarify this a bit.
>>>>
>>>> 1. Currently, the producer side logic is that if the partitioning key is
>>>> not provided (i.e., it is null), the partitioner won't be called. We did
>>>> that because we want to select a random and "available" partition to send
>>>> messages so that if some partitions are temporarily unavailable (because

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB