Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Re: Producer not distributing across all partitions


Copy link to this message
-
Re: Producer not distributing across all partitions
Jun Rao 2013-09-14, 03:47
Without fixing KAFKA-1017, the issue is that the producer will maintain a
socket connection per min(#partitions, #brokers). If you have lots of
producers, the open file handlers on the broker could be an issue.

So, what KAFKA-1017 fixes is to pick a random partition and stick to it for
a configurable amount of time, and then switch to another random partition.
This is the behavior in 0.7 when a load balancer is used and reduces #
socket connections significantly.

The issue you are reporting seems like a bug though. Which revision in 0.8
are you using?

Thanks,

Jun
On Fri, Sep 13, 2013 at 8:28 PM, prashant amar <[EMAIL PROTECTED]> wrote:

> Hi Guozhang, Joe, Drew
>
> In our case we have been running for the past 3 weeks and it has been
> consistently writing only to to the first partition. The rest of the
> partitions have empty index files.
>
> Not sure if I am hitting any issue here.
>
> I am using  offset checker as my barometer. Also introspect r&d the folder
> and it indicates the same.
>
> On Friday, September 13, 2013, Guozhang Wang wrote:
>
> > Hello Joe,
> >
> > The reason we make the producers to produce to a fixed partition for each
> > metadata-refresh interval are the following:
> >
> > https://issues.apache.org/jira/browse/KAFKA-1017
> >
> > https://issues.apache.org/jira/browse/KAFKA-959
> >
> > So in a word the randomness is still preserved but within one
> > metadata-refresh interval the assignment is fixed.
> >
> > I agree that the document should be updated accordingly.
> >
> > Guozhang
> >
> >
> > On Fri, Sep 13, 2013 at 1:48 PM, Joe Stein <[EMAIL PROTECTED]> wrote:
> >
> > > Isn't this a bug?
> > >
> > > I don't see why we would want users to have to code and generate random
> > > partition keys to randomly distributed the data to partitions, that is
> > > Kafka's job isn't it?
> > >
> > > Or if supplying a null value tell the user this is not supported (throw
> > > exception) in KeyedMessage like we do for topic and not treat null as a
> > key
> > > to hash?
> > >
> > > My preference is to put those three lines back in and let key be null
> and
> > > give folks randomness unless its not a bug and there is a good reason
> for
> > > it?
> > >
> > > Is there something about
> > > https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
> > > taken out? I haven't had a chance to look through
> > > it yet
> > >
> > > My thought is a new person coming in they would expect to see the
> > > partitions filling up in a round robin fashion as our docs says and
> > unless
> > > we force them in the API to know they have to-do this or give them the
> > > ability for this to happen when passing nothing in
> > >
> > > /*******************************************
> > >  Joe Stein
> > >  Founder, Principal Consultant
> > >  Big Data Open Source Security LLC
> > >  http://www.stealth.ly
> > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > ********************************************/
> > >
> > >
> > > On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <[EMAIL PROTECTED]> wrote:
> > >
> > > > I ran into this problem as well Prashant.  The default partition key
> > was
> > > > recently changed:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666f032be
> > > >
> > > > It no longer assigns a random partition to data with a null partition
> > > key.
> > > >  I had to change my code to generate random partition keys to get the
> > > > randomly distributed behavior the producer used to have.
> > > >
> > > >
> > > > On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > >
> > > > > Thanks Neha
> > > > >
> > > > > I will try applying this property and circle back.
> > > > >
> > > > > Also, I have been attempting to execute kafka-producer-perf-test.sh
> > > and I
> > > > > receive the following error
> > > > >
> > > > >        Error: Could not find or load main class
> > > > > kafka.perf.ProducerPerformance