I suspect each of the threads are not assigned equal number of messages to
send. I don't think it matter whether you use one producer or more as long
as you distribute work amongst those threads equally.
On Wednesday, April 17, 2013, Helin Xiang wrote:
> We are using kafka 0.7.2.
> The situation is a little complicated:
> 1. We use Java API and multi-thread to send logs to kafka. (like 16
> threads). Each thread contain its own kafka.javaapi.producer.Producer
> 2. There is one topic which the partition of is set to 4. we use random
> partition to send.
> 3. We generate messages of this topic at speed of 100 per second, so each
> thread only gets several logs per seconds.
> But we find the 4 partition gets unbalanced data. partition 0 gets logs 10
> times more than partition 1 ,2 and 3. Partition 1 , 2 , 3 gets nearly
> equal messages.
> after that, we set threads to 1, this unbalanced phenomenon vanished.
> we are not sure what happened under the java api of Producer.
> Could any one explain it ?
> Or is it necessary to generate new kafka.javaapi.producer.Producer object
> in each thread? I hear the kafka.javaapi.producer.Producer class is thread
> safe, but I don't know if 1 producer object can handle large throughput?
> *Best Regards
> Xiang Helin*