Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # dev >> Producer not distributing across all partitions


+
prashant amar 2013-09-13, 02:27
+
Neha Narkhede 2013-09-13, 09:04
Copy link to this message
-
Re: Producer not distributing across all partitions
Isn't this a bug?

I don't see why we would want users to have to code and generate random
partition keys to randomly distributed the data to partitions, that is
Kafka's job isn't it?

Or if supplying a null value tell the user this is not supported (throw
exception) in KeyedMessage like we do for topic and not treat null as a key
to hash?

My preference is to put those three lines back in and let key be null and
give folks randomness unless its not a bug and there is a good reason for
it?

Is there something about
https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
taken out? I haven't had a chance to look through
it yet

My thought is a new person coming in they would expect to see the
partitions filling up in a round robin fashion as our docs says and unless
we force them in the API to know they have to-do this or give them the
ability for this to happen when passing nothing in

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/
On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <[EMAIL PROTECTED]> wrote:

> I ran into this problem as well Prashant.  The default partition key was
> recently changed:
>
>
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666f032be
>
> It no longer assigns a random partition to data with a null partition key.
>  I had to change my code to generate random partition keys to get the
> randomly distributed behavior the producer used to have.
>
>
> On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <[EMAIL PROTECTED]>
> wrote:
>
> > Thanks Neha
> >
> > I will try applying this property and circle back.
> >
> > Also, I have been attempting to execute kafka-producer-perf-test.sh and I
> > receive the following error
> >
> >        Error: Could not find or load main class
> > kafka.perf.ProducerPerformance
> >
> > I am running against 0.8.0-beta1
> >
> > Seems like perf is a separate project in the workspace.
> >
> > Does sbt package-assembly bundle the perf jar as well?
> >
> > Neither producer-perf-test not consumer-test are working with this build
> >
> >
> >
> > On Fri, Sep 13, 2013 at 9:56 AM, Neha Narkhede <[EMAIL PROTECTED]
> > >wrote:
> >
> > > As Jun suggested, one reason could be that the
> > > topic.metadata.refresh.interval.ms is too high. Did you observe if the
> > > distribution improves after topic.metadata.refresh.interval.ms has
> > passed
> > > ?
> > >
> > > Thanks
> > > Neha
> > >
> > >
> > > On Fri, Sep 13, 2013 at 4:47 AM, prashant amar <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > I am using kafka 08 version ...
> > > >
> > > >
> > > > On Thu, Sep 12, 2013 at 8:44 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Which revision of 0.8 are you using? In a recent change, a producer
> > > will
> > > > > stick to a partition for topic.metadata.refresh.interval.ms
> (defaults
> > > to
> > > > > 10
> > > > > mins) time before picking another partition at random.
> > > > > Thanks,
> > > > > Jun
> > > > >
> > > > >
> > > > > On Thu, Sep 12, 2013 at 1:56 PM, prashant amar <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > I created a topic with 4 partitions and for some reason the
> > producer
> > > is
> > > > > > pushing only to one partition.
> > > > > >
> > > > > > This is consistently happening across all topics that I created
> ...
> > > > > >
> > > > > > Is there a specific configuration that I need to apply to ensure
> > that
> > > > > load
> > > > > > is evenly distributed across all partitions?
> > > > > >
> > > > > >
> > > > > > Group           Topic                          Pid Offset
> > > > >  logSize
> > > > > >         Lag             Owner
> > > > > > perfgroup1      perfpayload1                   0   10965
> > > > 11220
> > > > > >         255             perfgroup1_XXXX-0
> > > > > > perfgroup1      perfpayload1                   1   0