I am trying to understand how fast is kafka 0.7 compared to what I can get from hard drive. In essence I have 3 questions.
In all tests below, I'm using single broker with single one-partitioned topic. Kafka perf tests have been run in 2 deployment configs: - broker, perf-test on same host - broker, perf-test on different hosts (the results are practically the same, so wont post them here) I'm using FIO(http://freecode.com/projects/fio) to benchmark speed of hard drives.
socket.send.buffer=16777216 socket.receive.buffer=16777216 max.socket.request.bytes=104857600 log.flush.interval=10000 log.default.flush.interval.ms=1000 log.default.flush.scheduler.interval.ms=1000 num.threads=[num of cores] For kafka-producer-perf-test I'm assuming that IO access pattern is sequential write.
Here is the test I ran with FIO:
[sequential-write] rw=write size=50G ioengine=sync numjobs=1 directory=/tmp/fio filename=redo01.log Here is kafka performance test:
Question 2: Can something be done to improve consumer performance?
Question 3 (most improtant for me): What might be the reasons for consumer to behave so badly on fastest hardware available? I see in iostat, that consumer really does very little read requests to hard drive
avg-cpu: %user %nice %system %iowait %steal %idle 2.16 0.00 0.09 0.06 0.03 97.66 Besides that, even if the whole topic is in IO cache, the consumer speed is about 45 MB/s which is still quite below my expectations.
And the picture doesn't change in different deployment configs (broker and test on same node or 2 different nodes)
1) Clients and broker on the same host (all the results I've shown are for this configuration) 2) Client and broker on different hosts with 1 Gbits/s network channel bandwidth between them (verified with iperf)
The results are practically the same. Except that in case of consumer-perf-test for hi1.4xlarge, I am seeing little improvements for second deployment configuration: 20 MB/s. Which is little bit odd. 2013/8/30 Jun Rao <[EMAIL PROTECTED]>
Bejamin, do you mean thread on a client side? I'm not quite getting what I'm limited with. Can you please explain little bit more?
A single threaded producer is still capable of doing 50 MB/s on hi1.4xlarge. Which is quite slower than 377 MB/s from single job of FIO. But still 5 times faster than what I'm getting from consumer. Is it as expected to be?
Another mystery for me is that in case of hot IO cache (whole topic is in memory): I'm getting 50 MB/s - 100 MB/s (this huge std. dev. bugs me too) from a single threaded consumer.
And when cache is cold, I'm not seeing that kafka broker making best possible from SSD it has. I've tried setting fetch-size to 100 MB, but still kafka hits disk with 10 MB/s. (the disk by itself can satisfy much more read requests with same latency and provide much higher throughput).
It is hard to say where the bottleneck is just from your description. Would it be possible for you to rerun the consumer test using hprof on the consumer so we can understand whether the fetcher is waiting on the fetches (i.e. the broker is the bottleneck) or on the enque (i.e. the consumer is the bottleneck). Likewise for the producer test it would be good to do the same for the broker and producer process to understand what is happening there.
-Jay On Fri, Aug 30, 2013 at 1:34 AM, Rafael Bagmanov <[EMAIL PROTECTED]>wrote:
Your producer test uses a thread per core. Your consumer test uses a single thread. A single thread is likely insufficient to get maximum throughput. On Aug 30, 2013 8:46 AM, "Rafael Bagmanov" <[EMAIL PROTECTED]> wrote:
Benjamin Black 2013-08-30, 17:57
NEW: Monitor These Apps!
Apache Lucene, Apache Solr and all other Apache Software Foundation projects and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext