-Re: Relationship between Zookeeper and Kafka
Scott Clasen 2013-05-20, 17:17
Ahh, yeah, piops is definitely faster than standard EBS, but still much
slower than local disk.
you could try benchmarking local disk to see what the instances you are
using are capable of, then try tweaking iops etc to see where you get.
M1.Larges arent super fast so your macbook beating them isnt suprising to
On Mon, May 20, 2013 at 10:01 AM, Jason Weiss <[EMAIL PROTECTED]>wrote:
> Hi Scott.
> I'm using Kafka 0.7.2. I am using the default replication factor, since I
> don't recall changing that configuration at all.
> I'm using provisioned IOPS, which from attending the AWS event in NYC a
> few weeks ago was presented as the "fastest storage option" for EC2. A
> number of partners presented success stories in terms of throughput with
> provisioned IOPS. I've tried to follow that model.
> On 5/20/13 12:56 PM, "Scott Clasen" <[EMAIL PROTECTED]> wrote:
> >My guess, EBS is likely your bottleneck. Try running on instance local
> >disks, and compare your results. Is this 0.8? What replication factor are
> >you using?
> >On Mon, May 20, 2013 at 8:11 AM, Jason Weiss <[EMAIL PROTECTED]>
> >> I'm trying to maximize my throughput and seem to have hit a ceiling.
> >> Everything described below is running in AWS.
> >> I have configured a Kafka cluster with 5 machines, M1.Large, with 600
> >> provisioned IOPS storage for each EC2 instance. I have a Zookeeper
> >> (we aren't in production yet, so I didn't take the time to setup a ZK
> >> cluster). Publishing to a single topic from 7 different clients, I seem
> >> max out at around 20,000 eps with a fixed 2K message size. Each broker
> >> defines 10 file segments, with a 25000 message / 5 second flush
> >> configuration in server.properties. I have stuck with 8 threads. My
> >> producers (Java) are configured with batch.num.messages at 50, and
> >> queue.buffering.max.messages at 100.
> >> When I went from 4 servers in the cluster to 5 servers, I only saw an
> >> increase of about 500 events per second in throughput. In sharp
> >> when I run a complete environment on my MacBook Pro, tuned as described
> >> above but with a single ZK and a single Kafka broker, I am seeing 61,000
> >> events per second. I don't think I'm network constrained in the AWS
> >> environment (producer side) because when I add one more client, my
> >> Pro, I see a proportionate decrease in EC2 client throughput, and the
> >> result is an identical 20,000 eps. Stated differently, my EC2 instance
> >> up throughput when my local MacBook Pro joins the array of producers
> >> that the throughput is exactly the same.
> >> Does anyone have any additional suggestions on what else I could tune to
> >> try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the
> >> whitepapers published, LinkedIn describes a peak of 170,000 events per
> >> second across their cluster. My 20,000 seems so far away from their
> >> production figures.
> >> What is the relationship, in terms of performance, between ZK and Kafka?
> >> Do I need to have a more performant ZK cluster, the same, or does it
> >> not matter in terms of maximizing throughput.
> >> Thanks for any suggestions I've been pulling knobs and turning levers
> >> this for several days now.
> >> Jason
> >> This electronic message contains information which may be confidential
> >> privileged. The information is intended for the use of the individual or
> >> entity named above. If you are not the intended recipient, be aware that
> >> any disclosure, copying, distribution or use of the contents of this
> >> information is prohibited. If you have received this electronic
> >> transmission in error, please notify us by e-mail at (
> >> [EMAIL PROTECTED]) immediately.
> This electronic message contains information which may be confidential or
> privileged. The information is intended for the use of the individual or