Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # dev - Random Partitioning Issue


Copy link to this message
-
Re: Random Partitioning Issue
Joe Stein 2013-09-27, 16:24
Jun, can we hold this extra change over for 0.8.1 and just go with
reverting where we were before for the default with a new partition for
meta refresh and support both?

I am not sure I entirely understand why someone would need the extra
functionality you are talking about which sounds cool though... adding it
to the API (especially now) without people using it may just make folks ask
more questions and maybe not use it ... IDK ... but in any case we can work
on buttoning up 0.8 and shipping just the change for two partitioners
https://issues.apache.org/jira/browse/KAFKA-1067 and circling back if we
wanted on this extra item (including the discussion) to 0.8.1 or greater?
 I am always of the mind of reduce complexity unless that complexity is in
fact better than not having it.

On Sun, Sep 22, 2013 at 8:56 PM, Jun Rao <[EMAIL PROTECTED]> wrote:

> It's reasonable to make the behavior of random producers customizable
> through a pluggable partitioner. So, if one doesn't care about # of socket
> connections, one can choose to select a random partition on every send. If
> one does have many producers, one can choose to periodically select a
> random partition. To support this, the partitioner api needs to be changed
> though.
>
> Instead of
>   def partition(key: T, numPartitions: Int): Int
>
> we probably need something like the following:
>   def partition(key: T, numPartitions: Int, availablePartitionList:
> List[Int], isNewBatch: boolean, isRefreshMetadata: boolean): Int
>
> availablePartitionList: allows us to select only partitions that are
> available.
> isNewBatch: allows us to select the same partition for all messages in a
> given batch in the async mode.
> isRefreshMedatadata: allows us to implement the policy of switching to a
> random partition periodically.
>
> This will make the partitioner api a bit more complicated. However, it does
> provide enough information for customization.
>
> Thanks,
>
> Jun
>
>
>
> On Wed, Sep 18, 2013 at 4:23 PM, Joe Stein <[EMAIL PROTECTED]> wrote:
>
> > Sounds good, I will create a JIRA and upload a patch.
> >
> >
> > /*******************************************
> >  Joe Stein
> >  Founder, Principal Consultant
> >  Big Data Open Source Security LLC
> >  http://www.stealth.ly
> >  Twitter: @allthingshadoop
> > ********************************************/
> >
> >
> > On Sep 17, 2013, at 1:19 PM, Joel Koshy <[EMAIL PROTECTED]> wrote:
> >
> > > I agree that minimizing the number of producer connections (while
> > > being a good thing) is really required in very large production
> > > deployments, and the net-effect of the existing change is
> > > counter-intuitive to users who expect an immediate even distribution
> > > across _all_ partitions of the topic.
> > >
> > > However, I don't think it is a hack because it is almost exactly the
> > > same behavior as 0.7 in one of its modes. The 0.7 producer (which I
> > > think was even more confusing) had three modes:
> > > i) ZK send
> > > ii) Config send(a): static list of broker1:port1,broker2:port2,etc.
> > > iii) Config send(b): static list of a hardwareVIP:VIPport
> > >
> > > (i) and (ii) would achieve even distribution. (iii) would effectively
> > > select one broker and distribute to partitions on that broker within
> > > each reconnect interval. (iii) is very similar to what we now do in
> > > 0.8. (Although we stick to one partition during each metadata refresh
> > > interval that can be changed to stick to one broker and distribute
> > > across partitions on that broker).
> > >
> > > At the same time, I agree with Joe's suggestion that we should keep
> > > the more intuitive pre-KAFKA-1017 behavior as the default and move the
> > > change in KAFKA-1017 to a more specific partitioner implementation.
> > >
> > > Joel
> > >
> > >
> > > On Sun, Sep 15, 2013 at 8:44 AM, Jay Kreps <[EMAIL PROTECTED]>
> wrote:
> > >> Let me ask another question which I think is more objective. Let's say
> > 100
> > >> random, smart infrastructure specialists try Kafka, of these 100 how