Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # dev >> Random Partitioning Issue


Copy link to this message
-
Re: Random Partitioning Issue
Joe,

Not sure I fully understand your propose. Do you want to put the random
partitioning selection logic (for messages without a key) in the
partitioner without changing the partitioner api? That's difficult. The
issue is that in the current partitioner api, we don't know which
partitions are available. For example, if we have replication factor 1 on a
topic and a broker is down, the best thing to do for the random partitioner
is to select an available partition at random (assuming more than 1
partition is created for the topic).

Another option is to revert the logic in the random partitioning selection
logic in DefaultEventHandler to select a random partition per batch of
events (instead of sticking with a random partition for some configured
amount of time). This is doable, but I am not sure if it's that critical.
Since this is one of the two possible behaviors in 0.7, it's hard to say
whether people will be surprised by that. Preserving both behaviors in 0.7
will require changing the partitioner api. This is more work and I agree
it's better to do this post 0.8.0 final.

Thanks,

Jun

On Fri, Sep 27, 2013 at 9:24 AM, Joe Stein <[EMAIL PROTECTED]> wrote:

> Jun, can we hold this extra change over for 0.8.1 and just go with
> reverting where we were before for the default with a new partition for
> meta refresh and support both?
>
> I am not sure I entirely understand why someone would need the extra
> functionality you are talking about which sounds cool though... adding it
> to the API (especially now) without people using it may just make folks ask
> more questions and maybe not use it ... IDK ... but in any case we can work
> on buttoning up 0.8 and shipping just the change for two partitioners
> https://issues.apache.org/jira/browse/KAFKA-1067 and circling back if we
> wanted on this extra item (including the discussion) to 0.8.1 or greater?
>  I am always of the mind of reduce complexity unless that complexity is in
> fact better than not having it.
>
> On Sun, Sep 22, 2013 at 8:56 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > It's reasonable to make the behavior of random producers customizable
> > through a pluggable partitioner. So, if one doesn't care about # of
> socket
> > connections, one can choose to select a random partition on every send.
> If
> > one does have many producers, one can choose to periodically select a
> > random partition. To support this, the partitioner api needs to be
> changed
> > though.
> >
> > Instead of
> >   def partition(key: T, numPartitions: Int): Int
> >
> > we probably need something like the following:
> >   def partition(key: T, numPartitions: Int, availablePartitionList:
> > List[Int], isNewBatch: boolean, isRefreshMetadata: boolean): Int
> >
> > availablePartitionList: allows us to select only partitions that are
> > available.
> > isNewBatch: allows us to select the same partition for all messages in a
> > given batch in the async mode.
> > isRefreshMedatadata: allows us to implement the policy of switching to a
> > random partition periodically.
> >
> > This will make the partitioner api a bit more complicated. However, it
> does
> > provide enough information for customization.
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> > On Wed, Sep 18, 2013 at 4:23 PM, Joe Stein <[EMAIL PROTECTED]> wrote:
> >
> > > Sounds good, I will create a JIRA and upload a patch.
> > >
> > >
> > > /*******************************************
> > >  Joe Stein
> > >  Founder, Principal Consultant
> > >  Big Data Open Source Security LLC
> > >  http://www.stealth.ly
> > >  Twitter: @allthingshadoop
> > > ********************************************/
> > >
> > >
> > > On Sep 17, 2013, at 1:19 PM, Joel Koshy <[EMAIL PROTECTED]> wrote:
> > >
> > > > I agree that minimizing the number of producer connections (while
> > > > being a good thing) is really required in very large production
> > > > deployments, and the net-effect of the existing change is
> > > > counter-intuitive to users who expect an immediate even distribution

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB