Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Relationship between Zookeeper and Kafka


Copy link to this message
-
Re: Relationship between Zookeeper and Kafka
Scott Clasen 2013-05-20, 16:56
My guess, EBS is likely your bottleneck.  Try running on instance local
disks, and compare your results.  Is this 0.8? What replication factor are
you using?
On Mon, May 20, 2013 at 8:11 AM, Jason Weiss <[EMAIL PROTECTED]> wrote:

> I'm trying to maximize my throughput and seem to have hit a ceiling.
> Everything described below is running in AWS.
>
> I have configured a Kafka cluster with 5 machines, M1.Large, with 600
> provisioned IOPS storage for each EC2 instance. I have a Zookeeper server
> (we aren't in production yet, so I didn't take the time to setup a ZK
> cluster). Publishing to a single topic from 7 different clients, I seem to
> max out at around 20,000 eps with a fixed 2K message size. Each broker
> defines 10 file segments, with a 25000 message / 5 second flush
> configuration in server.properties. I have stuck with 8 threads. My
> producers (Java) are configured with batch.num.messages at 50, and
> queue.buffering.max.messages at 100.
>
> When I went from 4 servers in the cluster to 5 servers, I only saw an
> increase of about 500 events per second in throughput. In sharp contrast,
> when I run a complete environment on my MacBook Pro, tuned as described
> above but with a single ZK and a single Kafka broker, I am seeing 61,000
> events per second. I don't think I'm network constrained in the AWS
> environment (producer side) because when I add one more client, my MacBook
> Pro, I see a proportionate decrease in EC2 client throughput, and the net
> result is an identical 20,000 eps. Stated differently, my EC2 instance give
> up throughput when my local MacBook Pro joins the array of producers such
> that the throughput is exactly the same.
>
> Does anyone have any additional suggestions on what else I could tune to
> try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the
> whitepapers published, LinkedIn describes a peak of 170,000 events per
> second across their cluster. My 20,000 seems so far away from their
> production figures.
>
> What is the relationship, in terms of performance, between ZK and Kafka?
> Do I need to have a more performant ZK cluster, the same, or does it really
> not matter in terms of maximizing throughput.
>
> Thanks for any suggestions – I've been pulling knobs and turning levers on
> this for several days now.
>
>
> Jason
>
> This electronic message contains information which may be confidential or
> privileged. The information is intended for the use of the individual or
> entity named above. If you are not the intended recipient, be aware that
> any disclosure, copying, distribution or use of the contents of this
> information is prohibited. If you have received this electronic
> transmission in error, please notify us by e-mail at (
> [EMAIL PROTECTED]) immediately.
>