Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Kafka producer behavior


Copy link to this message
-
Re: Kafka producer behavior
Hi,

this is a gotcha about kafka producer partitioning, you much send the
messages with a non null key.
If the key is null kafka will not call the partitioner.

Because with this partitioner the key does not matter you can pass in a
constant string like "1" etc.

Oh one more thing, on performance:

The produce's send method has a synchronized block on the producer
instance, which means performance goes down the drain.
I could only get (on a 12 core, 72 gig ram) machine 13K tps out of the
producer. A way to solve this is to instantiate an array/list of N
producers and then in your send code round robin over the producers.
I got to 80K tps (for my use case) using 6 producer instances from a single
box sending to 3 kafka servers.

e.g.
send ( msg ) {
  producers[ producer-index.getAndIncrement() % producer_count ].send(msg)
}

Regards,
 Gerrit
On Wed, Dec 18, 2013 at 11:24 AM, Hanish Bansal <
[EMAIL PROTECTED]> wrote:

> Thanks for response Gerrit and Guozhang !!
>
> Hi Gerrit,
>
> I am trying to use  same round robin partitioner shared by you but hard
> luck, still round robin partitioning not working.
>
> I have successfully registered RoundRobinPartitioner in kafka producer.
>
> Code of RoundRobinPartitioner class as:
>
>     public RoundRobinPartitioner(VerifiableProperties props){
>              log.info("Using Round Robin Partitioner class...");
>     }
>
>         @Override
>         public int partition(String key, int partitions) {
>             log.info("Inside partition method");
>             int i = counter.getAndIncrement();
>             if(i == Integer.MAX_VALUE){
>                     counter.set(0);
>              return 0;
>             }else
>              return i % partitions;
>         }
>
> When i produce the data, first log message "Using Round Robin Partitioner
> class..." is printed and second message "Inside partition method" is not
> printed.
>
> From that we can ensure that RoundRobinPartitioner has been successfully
> registered but logic of round robin is not getting called.
>
> Any help to resolve what i am missing ?
>
> Thanks in advance !!
>
>
>
> On Tue, Dec 17, 2013 at 5:59 PM, Guozhang Wang <[EMAIL PROTECTED]> wrote:
>
> > Hello,
> >
> > This issue is known as in this JIRA:
> >
> > https://issues.apache.org/jira/browse/KAFKA-1067
> >
> > Guozhang
> >
> >
> > On Tue, Dec 17, 2013 at 8:48 AM, Gerrit Jansen van Vuuren <
> > [EMAIL PROTECTED]> wrote:
> >
> > > hi,
> > >
> > > I've had the same issue with the kafka producer.
> > >
> > > you need to use a different partitioner than the default one provided
> for
> > > kafka.
> > > I've created a round robin partitioner that works well for equally
> > > distributing data across partitions.
> > >
> > >
> > >
> >
> https://github.com/gerritjvv/pseidon/blob/master/pseidon-kafka/java/pseidon/kafka/util/RoundRobinPartitioner.java
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Dec 17, 2013 at 5:32 PM, Hanish Bansal <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > > > Hi All,
> > > >
> > > > We are having kafka cluster of 2 nodes. (using 0.8.0 final release)
> > > > Replication Factor: 2
> > > > Number of partitions: 2
> > > >
> > > > I have created a topic "test-topic1" in kafka.
> > > >
> > > > When i am listing status of that topic using bin/kafka-list-topic.sh,
> > the
> > > > status is:
> > > >
> > > > topic: test-topic1    partition: 0    leader: 0       replicas: 0,1
> > > isr:
> > > > 0,1
> > > > topic: test-topic1    partition: 1    leader: 1       replicas: 1,0
> > > isr:
> > > > 1,0
> > > >
> > > > As both partition are on two separate nodes so when we produce the
> data
> > > it
> > > > should be go to both nodes.
> > > >
> > > > But when i insert the data, it is going to only one node.
> > > >
> > > > For example if i insert 1000 messages then all 1000 messages will go
> > > either
> > > > node1 or node2. Data is not evenly distributed on both nodes.
> > > >
> > > > Expected: 500 messages should go to node1 and 500 messages should go