Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Re: Producer not distributing across all partitions


Copy link to this message
-
Re: Producer not distributing across all partitions
Without fixing KAFKA-1017, the issue is that the producer will maintain a
socket connection per min(#partitions, #brokers). If you have lots of
producers, the open file handlers on the broker could be an issue.

So, what KAFKA-1017 fixes is to pick a random partition and stick to it for
a configurable amount of time, and then switch to another random partition.
This is the behavior in 0.7 when a load balancer is used and reduces #
socket connections significantly.

The issue you are reporting seems like a bug though. Which revision in 0.8
are you using?

Thanks,

Jun
On Fri, Sep 13, 2013 at 8:28 PM, prashant amar <[EMAIL PROTECTED]> wrote:

> Hi Guozhang, Joe, Drew
>
> In our case we have been running for the past 3 weeks and it has been
> consistently writing only to to the first partition. The rest of the
> partitions have empty index files.
>
> Not sure if I am hitting any issue here.
>
> I am using  offset checker as my barometer. Also introspect r&d the folder
> and it indicates the same.
>
> On Friday, September 13, 2013, Guozhang Wang wrote:
>
> > Hello Joe,
> >
> > The reason we make the producers to produce to a fixed partition for each
> > metadata-refresh interval are the following:
> >
> > https://issues.apache.org/jira/browse/KAFKA-1017
> >
> > https://issues.apache.org/jira/browse/KAFKA-959
> >
> > So in a word the randomness is still preserved but within one
> > metadata-refresh interval the assignment is fixed.
> >
> > I agree that the document should be updated accordingly.
> >
> > Guozhang
> >
> >
> > On Fri, Sep 13, 2013 at 1:48 PM, Joe Stein <[EMAIL PROTECTED]> wrote:
> >
> > > Isn't this a bug?
> > >
> > > I don't see why we would want users to have to code and generate random
> > > partition keys to randomly distributed the data to partitions, that is
> > > Kafka's job isn't it?
> > >
> > > Or if supplying a null value tell the user this is not supported (throw
> > > exception) in KeyedMessage like we do for topic and not treat null as a
> > key
> > > to hash?
> > >
> > > My preference is to put those three lines back in and let key be null
> and
> > > give folks randomness unless its not a bug and there is a good reason
> for
> > > it?
> > >
> > > Is there something about
> > > https://issues.apache.org/jira/browse/KAFKA-691that requires the lines
> > > taken out? I haven't had a chance to look through
> > > it yet
> > >
> > > My thought is a new person coming in they would expect to see the
> > > partitions filling up in a round robin fashion as our docs says and
> > unless
> > > we force them in the API to know they have to-do this or give them the
> > > ability for this to happen when passing nothing in
> > >
> > > /*******************************************
> > >  Joe Stein
> > >  Founder, Principal Consultant
> > >  Big Data Open Source Security LLC
> > >  http://www.stealth.ly
> > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > ********************************************/
> > >
> > >
> > > On Fri, Sep 13, 2013 at 4:17 PM, Drew Goya <[EMAIL PROTECTED]> wrote:
> > >
> > > > I ran into this problem as well Prashant.  The default partition key
> > was
> > > > recently changed:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/commit/b71e6dc352770f22daec0c9a3682138666f032be
> > > >
> > > > It no longer assigns a random partition to data with a null partition
> > > key.
> > > >  I had to change my code to generate random partition keys to get the
> > > > randomly distributed behavior the producer used to have.
> > > >
> > > >
> > > > On Fri, Sep 13, 2013 at 11:42 AM, prashant amar <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > >
> > > > > Thanks Neha
> > > > >
> > > > > I will try applying this property and circle back.
> > > > >
> > > > > Also, I have been attempting to execute kafka-producer-perf-test.sh
> > > and I
> > > > > receive the following error
> > > > >
> > > > >        Error: Could not find or load main class
> > > > > kafka.perf.ProducerPerformance

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB