Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Re: Producer not distributing across all partitions


Copy link to this message
-
Re: Producer not distributing across all partitions
Hello Joe,

The reason we make the producers to produce to a fixed partition for each
metadata-refresh interval are the following:

https://issues.apache.org/jira/browse/KAFKA-1017

https://issues.apache.org/jira/browse/KAFKA-959

So in a word the randomness is still preserved but within one
metadata-refresh interval the assignment is fixed.

I agree that the document should be updated accordingly.

Guozhang
On Fri, Sep 13, 2013 at 1:48 PM, Joe Stein <[EMAIL PROTECTED]> wrote:

> Isn't this a bug?
>
> I don't see why we would want users to have to code and generate random
> partition keys to randomly distributed the data to partitions, that is
> Kafka's job isn't it?
>
> Or if supplying a null value tell the user this is not supported (throw
> exception) in KeyedMessage like we do for topic and not treat null as a key
> to hash?
>
> My preference is to put those three lines back in and let key be null and
> give folks randomness unless its not a bug and there is a good reason for
> it?
>
> Is there something about
> https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
> taken out? I haven't had a chance to look through
> it yet
>
> My thought is a new person coming in they would expect to see the
> partitions filling up in a round robin fashion as our docs says and unless
> we force them in the API to know they have to-do this or give them the
> ability for this to happen when passing nothing in
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <[EMAIL PROTECTED]> wrote:
>
> > I ran into this problem as well Prashant.  The default partition key was
> > recently changed:
> >
> >
> >
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666f032be
> >
> > It no longer assigns a random partition to data with a null partition
> key.
> >  I had to change my code to generate random partition keys to get the
> > randomly distributed behavior the producer used to have.
> >
> >
> > On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Thanks Neha
> > >
> > > I will try applying this property and circle back.
> > >
> > > Also, I have been attempting to execute kafka-producer-perf-test.sh
> and I
> > > receive the following error
> > >
> > >        Error: Could not find or load main class
> > > kafka.perf.ProducerPerformance
> > >
> > > I am running against 0.8.0-beta1
> > >
> > > Seems like perf is a separate project in the workspace.
> > >
> > > Does sbt package-assembly bundle the perf jar as well?
> > >
> > > Neither producer-perf-test not consumer-test are working with this
> build
> > >
> > >
> > >
> > > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > As Jun suggested, one reason could be that the
> > > > topic.metadata.refresh.interval.ms is too high. Did you observe if
> the
> > > > distribution improves after topic.metadata.refresh.interval.ms has
> > > passed
> > > > ?
> > > >
> > > > Thanks
> > > > Neha
> > > >
> > > >
> > > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <[EMAIL PROTECTED]>
> > > > wrote:
> > > >
> > > > > I am using kafka 08 version ...
> > > > >
> > > > >
> > > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > Which revision of 0.8 are you using? In a recent change, a
> producer
> > > > will
> > > > > > stick to a partition for topic.metadata.refresh.interval.ms
> > (defaults
> > > > to
> > > > > > 10
> > > > > > mins) time before picking another partition at random.
> > > > > > Thanks,
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <
> > [EMAIL PROTECTED]>
> > > > > > wrote:
> > > > > >
> > > > > > > I created a topic with 4 partitions and for some reason the
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB