Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Copy link to this message
Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)
In 0.7, we have 1 producer send thread per broker. This is changed in 0.8,
where there is only 1 producer send thread per producer. If a producer
needs to send messages to multiple brokers, the send thread will do that
serially, which will reduce the throughput. We plan to improve that in 0.9
through client rewrites. For now, you can improve the throughput by either
using a larger batch size or using more producer instances.

As for degraded performance with more partitions, are you specifying a key
for each message?



On Wed, Jan 1, 2014 at 4:17 AM, yosi botzer <[EMAIL PROTECTED]> wrote:

> Hi,
> I am using kafka 0.8. I have 3 machines each running kafka broker.
> I am using async mode of my Producer. I expected to see 3 different threads
> with names starting with ProducerSendThread- (according to this article:
> http://engineering.gnip.com/kafka-async-producer/)
> However I can see only one thread with the name *ProducerSendThread-*
> This is my producer configuration:
> server=1
> topic=dat7
> metadata.broker.list=
> ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> serializer.class=kafka.serializer.DefaultEncoder
> request.required.acks=1
> compression.codec=snappy
> producer.type=async
> queue.buffering.max.ms=2000
> queue.buffering.max.messages=1000
> batch.num.messages=500
> *What am I missing here?*
> BTW, I have also experienced very strange behavior regrading my producer
> performance (which may or may not be related to the issue above).
> When I have defined a topic with 1 partition I got much better throughput
> comparing to a topic with 3 partitions. A producer sending messages to a
> topic with 3 partitions had much better throughput comparing to a topic
> with 12 partitions.
> I would expect to have best performance for the topic with 12 partitions
> since I have 3 machines running a broker each of with 4 disks (the broker
> is configured to use all 4 disks)
> *Is there any logical explanation for this behavior?*