Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - Apache Kafka in AWS


Copy link to this message
-
Re: Apache Kafka in AWS
Ken Krugler 2013-05-23, 01:00
Hi Jason,

On May 22, 2013, at 3:35pm, Jason Weiss wrote:

> Ken,
>
> Great question! I should have indicated I was using EBS, 500GB with 2000 provisioned IOPs.

OK, thanks. Sounds like you were pegged on CPU usage.

But that does surprise me a bit. Did you check that you were using all cores?

Thanks,

-- Ken

PS - back in 2006 I spent a week of hell debugging an occasion job failure on Hadoop (this is when it was still part of Nutch). Turns out one of our 12 slaves was accidentally using OpenJDK, and this had a JIT compiler bug that would occasionally rear its ugly head. Obviously the Sun/Oracle JRE isn't bug-free, but it gets a lot more stress testing. So one of my basic guidelines in the ops portion of the Hadoop class I teach is that every server must have exactly the same version of Oracle's JRE.

> ________________________________________
> From: Ken Krugler [[EMAIL PROTECTED]]
> Sent: Wednesday, May 22, 2013 17:23
> To: [EMAIL PROTECTED]
> Subject: Re: Apache Kafka in AWS
>
> Hi Jason,
>
> Thanks for the notes.
>
> I'm curious whether you went with using local drives (ephemeral storage) or EBS, and if with EBS then what IOPS.
>
> Thanks,
>
> -- Ken
>
> On May 22, 2013, at 1:42pm, Jason Weiss wrote:
>
>> All,
>>
>> I asked a number of questions of the group over the last week, and I'm happy to report that I've had great success getting Kafka up and running in AWS. I am using 3 EC2 instances, each of which is a M2 High-Memory Quadruple Extra Large with 8 cores and 58.4 GiB of memory according to the AWS specs. I have co-located Zookeeper instances next to Zafka on each machine.
>>
>> I am able to publish in a repeatable fashion 273,000 events per second, with each event payload consisting of a fixed size of 2048 bytes! This represents the maximum throughput possible on this configuration, as the servers became CPU constrained, averaging 97% utilization in a relatively flat line. This isn't a "burst" speed – it represents a sustained throughput from 20 M1 Large EC2 Kafka multi-threaded producers. Putting this into perspective, if my log retention period was a month, I'd be aggregating 1.3 petabytes of data on my disk drives. Suffice to say, I don't see us retaining data for more than a few hours!
>>
>> Here were the keys to tuning for future folks to consider:
>>
>> First and foremost, be sure to configure your Java heap size accordingly when you launch Kafka. The default is like 512MB, which in my case left virtually all of my RAM inaccessible to Kafka.
>> Second, stay away from OpenJDK. No, seriously – this was a huge thorn in my side, and I almost gave up on Kafka because of the problems I encountered. The OpenJDK NIO functions repeatedly resulted in Kafka crashing and burning in dramatic fashion. The moment I switched over to Oracle's JDK for linux, Kafka didn't puke once- I mean, like not even a hiccup.
>> Third know your message size. In my opinion, the more you understand about your event payload characteristics, the better you can tune the system. The two knobs to really turn are the log.flush.interval and log.default.flush.interval.ms. The values here are intrinsically connected to the types of payloads you are putting through the system.
>> Fourth and finally, to maximize throughput you have to code against the async paradigm, and be prepared to tweak the batch size, queue properties, and compression codec (wait for it…) in a way that matches the message payload you are putting through the system and the capabilities of the producer system itself.
>>
>>
>> Jason
>>
>>
>>
>>
>>
>> This electronic message contains information which may be confidential or privileged. The information is intended for the use of the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic transmission in error, please notify us by e-mail at ([EMAIL PROTECTED]) immediately.

Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr