Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # user - Apache Kafka in AWS


+
Jason Weiss 2013-05-22, 20:42
Copy link to this message
-
Re: Apache Kafka in AWS
Neha Narkhede 2013-05-22, 20:57
Thanks for sharing your experience with the community, Jason!

-Neha
On Wed, May 22, 2013 at 1:42 PM, Jason Weiss <[EMAIL PROTECTED]> wrote:

> All,
>
> I asked a number of questions of the group over the last week, and I'm
> happy to report that I've had great success getting Kafka up and running in
> AWS. I am using 3 EC2 instances, each of which is a M2 High-Memory
> Quadruple Extra Large with 8 cores and 58.4 GiB of memory according to the
> AWS specs. I have co-located Zookeeper instances next to Zafka on each
> machine.
>
> I am able to publish in a repeatable fashion 273,000 events per second,
> with each event payload consisting of a fixed size of 2048 bytes! This
> represents the maximum throughput possible on this configuration, as the
> servers became CPU constrained, averaging 97% utilization in a relatively
> flat line. This isn't a "burst" speed – it represents a sustained
> throughput from 20 M1 Large EC2 Kafka multi-threaded producers. Putting
> this into perspective, if my log retention period was a month, I'd be
> aggregating 1.3 petabytes of data on my disk drives. Suffice to say, I
> don't see us retaining data for more than a few hours!
>
> Here were the keys to tuning for future folks to consider:
>
> First and foremost, be sure to configure your Java heap size accordingly
> when you launch Kafka. The default is like 512MB, which in my case left
> virtually all of my RAM inaccessible to Kafka.
> Second, stay away from OpenJDK. No, seriously – this was a huge thorn in
> my side, and I almost gave up on Kafka because of the problems I
> encountered. The OpenJDK NIO functions repeatedly resulted in Kafka
> crashing and burning in dramatic fashion. The moment I switched over to
> Oracle's JDK for linux, Kafka didn't puke once- I mean, like not even a
> hiccup.
> Third know your message size. In my opinion, the more you understand about
> your event payload characteristics, the better you can tune the system. The
> two knobs to really turn are the log.flush.interval and
> log.default.flush.interval.ms. The values here are intrinsically
> connected to the types of payloads you are putting through the system.
> Fourth and finally, to maximize throughput you have to code against the
> async paradigm, and be prepared to tweak the batch size, queue properties,
> and compression codec (wait for it…) in a way that matches the message
> payload you are putting through the system and the capabilities of the
> producer system itself.
>
>
> Jason
>
>
>
>
>
> This electronic message contains information which may be confidential or
> privileged. The information is intended for the use of the individual or
> entity named above. If you are not the intended recipient, be aware that
> any disclosure, copying, distribution or use of the contents of this
> information is prohibited. If you have received this electronic
> transmission in error, please notify us by e-mail at (
> [EMAIL PROTECTED]) immediately.
>

 
+
Ken Krugler 2013-05-22, 21:24
+
Scott Clasen 2013-05-22, 23:27
+
Jonathan Hodges 2013-05-22, 23:11
+
Scott Clasen 2013-05-22, 23:56
+
Ken Krugler 2013-05-23, 01:00
+
Jun Rao 2013-05-23, 04:17
+
Jun Rao 2013-05-23, 14:11
+
Jason Weiss 2013-05-23, 14:13
+
S Ahmed 2013-05-28, 19:48
+
S Ahmed 2013-05-29, 17:40