Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)


Copy link to this message
-
Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)
When the producer send thread sends a batch of messages, it first
determines which partition each message should go to. It then groups
messages by broker (based on the leader of the partition of each
message) and sends a produce request per broker (each request may include
multiple partitions). Those produce requests are sent serially. So, if
there is only one partition, only 1 produce request needs to be sent per
batch of messages. If there are 3 partitions, chances are 3 produce
requests are needed. Because those produce requests are sent serially, the
more partitions you have, the more produce requests and the longer the
latency for sending a batch of messages. However, having 12 partitions
shouldn't be significantly worse than 3 partitions since there are only 3
brokers. One way to improve performance is to use a larger batch size. Try
making the batch size 3 times larger with 3 partitions.

Thanks,

Jun
On Wed, Jan 1, 2014 at 12:43 PM, yosi botzer <[EMAIL PROTECTED]> wrote:

> Yes I am specifying a key for each message.
>
> The same producer code works much slower when sending messages to a topic
> with multiple partitions comparing to a topic with a single partition. This
> doesn't make any sense to me at all.
>
> If I understand correctly I need multiple partitions in order to scale the
> consumers.
>
> Could it be because the async producer is creating a connection per broker
> (or per partition) and this is done in a serial way once the producer needs
> to sens the messages? maybe when using a single partition the producer is
> dong it in one batch
>
> BTW, I have tried using multiple Producer instances but still I get poor
> performance when using a topic with multiple partitions (by multiple
> partitions I mean 12 which is exactly the number of broker machines
> multiply by the number of disks I have on each machine which sounds
> reasonable to me)
>
> Is there any solution anyone can think of?
>
>
> Yosi
>
>
>
> On Wed, Jan 1, 2014 at 7:57 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > In 0.7, we have 1 producer send thread per broker. This is changed in
> 0.8,
> > where there is only 1 producer send thread per producer. If a producer
> > needs to send messages to multiple brokers, the send thread will do that
> > serially, which will reduce the throughput. We plan to improve that in
> 0.9
> > through client rewrites. For now, you can improve the throughput by
> either
> > using a larger batch size or using more producer instances.
> >
> > As for degraded performance with more partitions, are you specifying a
> key
> > for each message?
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Jan 1, 2014 at 4:17 AM, yosi botzer <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi,
> > >
> > > I am using kafka 0.8. I have 3 machines each running kafka broker.
> > >
> > > I am using async mode of my Producer. I expected to see 3 different
> > threads
> > > with names starting with ProducerSendThread- (according to this
> article:
> > > http://engineering.gnip.com/kafka-async-producer/)
> > >
> > > However I can see only one thread with the name *ProducerSendThread-*
> > >
> > > This is my producer configuration:
> > >
> > > server=1
> > > topic=dat7
> > > metadata.broker.list=
> > > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > > serializer.class=kafka.serializer.DefaultEncoder
> > > request.required.acks=1
> > > compression.codec=snappy
> > > producer.type=async
> > > queue.buffering.max.ms=2000
> > > queue.buffering.max.messages=1000
> > > batch.num.messages=500
> > >
> > >
> > > *What am I missing here?*
> > >
> > >
> > > BTW, I have also experienced very strange behavior regrading my
> producer
> > > performance (which may or may not be related to the issue above).
> > >
> > > When I have defined a topic with 1 partition I got much better
> throughput
> > > comparing to a topic with 3 partitions. A producer sending messages to

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB