Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Relationship between Zookeeper and Kafka


Copy link to this message
-
Re: Relationship between Zookeeper and Kafka
Cool.

By the way, I do mean you should use 'atop'. That was not a typo on my part.

http://www.atoptool.nl/downloadatop.php

apt-get install atop

on Ubuntu systems.

Philip

On May 21, 2013, at 4:51 PM, Jason Weiss <[EMAIL PROTECTED]> wrote:

> Philip,
>
> Thanks for the response. I used top yesterday and determined that part of
> my problem was that the kafaka shell script is pre-configured to only use
> 512M of RAM, and thus it wasn't using memory efficiently. That has helped
> out tremendously. Adding an echo at the start of the script that it was
> defaulting to such a low value probably would have saved me some time. In
> the same vein, I should have inspected the launch command more closely.
>
> The virtualization of AWS makes it difficult to truly know what your
> performance is, IMHO. There are lots of people arguing on the web about
> the value of bare metal versus virtualization. I am still baffled how
> companies like Urban Airship are purportedly seeing bursts of 750,000
> messages per second on a 3-cluster machine, but by playing with the knobs
> in a controlled manner, I'm starting to better understand the relationship
> and effect on the overall system.
>
> Jason
>
>
> On 5/21/13 11:44 AM, "Philip O'Toole" <[EMAIL PROTECTED]> wrote:
>
>> As a test, why not just use a disk with provisioned IOPs of 4000? Just as
>> a test - see if it improves.
>>
>> Also, you have not supplied any metrics regarding the VM's performance.
>> Is the CPU busy? Is IO maxed out? Network? Disk? Use a tool like atop,
>> and tell us what you find.
>>
>> Philip
>>
>> On May 20, 2013, at 6:43 PM, Ken Krugler <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Hi Jason,
>>>
>>> On May 20, 2013, at 10:01am, Jason Weiss wrote:
>>>
>>>> Hi Scott.
>>>>
>>>> I'm using Kafka 0.7.2. I am using the default replication factor,
>>>> since I
>>>> don't recall changing that configuration at all.
>>>>
>>>> I'm using provisioned IOPS, which from attending the AWS event in NYC a
>>>> few weeks ago was presented as the "fastest storage option" for EC2. A
>>>> number of partners presented success stories in terms of throughput
>>>> with
>>>> provisioned IOPS. I've tried to follow that model.
>>>
>>> In my experience directly hitting an ephemeral drive on m1.large is
>>> faster than using EBS.
>>>
>>> I've seen some articles where RAIDing multiple EBS volumes can exceed
>>> the performance of ephemeral drives, but with high variability.
>>>
>>> If you want to maximize performance, set up up a (smaller) cluster of
>>> SSD-backed instances with 10Gb Ethernet in the same cluster group.
>>>
>>> E.g. test with three cr1.8xlarge instances.
>>>
>>> -- Ken
>>>
>>>
>>>> On 5/20/13 12:56 PM, "Scott Clasen" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> My guess, EBS is likely your bottleneck.  Try running on instance
>>>>> local
>>>>> disks, and compare your results.  Is this 0.8? What replication
>>>>> factor are
>>>>> you using?
>>>>>
>>>>>
>>>>> On Mon, May 20, 2013 at 8:11 AM, Jason Weiss <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>>> I'm trying to maximize my throughput and seem to have hit a ceiling.
>>>>>> Everything described below is running in AWS.
>>>>>>
>>>>>> I have configured a Kafka cluster with 5 machines, M1.Large, with 600
>>>>>> provisioned IOPS storage for each EC2 instance. I have a Zookeeper
>>>>>> server
>>>>>> (we aren't in production yet, so I didn't take the time to setup a ZK
>>>>>> cluster). Publishing to a single topic from 7 different clients, I
>>>>>> seem
>>>>>> to
>>>>>> max out at around 20,000 eps with a fixed 2K message size. Each
>>>>>> brokers
>>>>>> defines 10 file segments, with a 25000 message / 5 second flush
>>>>>> configuration in server.properties. I have stuck with 8 threads. My
>>>>>> producers (Java) are configured with batch.num.messages at 50, and
>>>>>> queue.buffering.max.messages at 100.
>>>>>>
>>>>>> When I went from 4 servers in the cluster to 5 servers, I only saw an
>>>>>> increase of about 500 events per second in throughput. In sharp

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB