|
Senthilvel Rangaswamy
2012-11-17, 01:04
Joel Koshy
2012-11-17, 01:57
Bae, Jae Hyeon
2012-11-18, 23:26
Neha Narkhede
2012-11-18, 23:36
David Arthur
2012-11-19, 19:30
Matthew Rathbone
2012-11-19, 19:44
James A. Robinson
2012-11-19, 19:56
Senthilvel Rangaswamy
2012-11-19, 20:17
Matt Jones
2012-11-19, 21:34
Matthew Rathbone
2012-11-19, 21:44
Bae, Jae Hyeon
2012-11-20, 00:36
David Arthur
2012-11-20, 15:52
Evan Chan
2012-11-20, 17:12
|
-
Kafka on EC2Senthilvel Rangaswamy 2012-11-17, 01:04
Have folks implemented large installations on Kafka on Amazon EC2. I am
looking for best practices. Like the kind of nodes, EBS vs Instance store etc., -- ..Senthil "If there's anything more important than my ego around, I want it caught and shot now." - Douglas Adams.
-
Re: Kafka on EC2Joel Koshy 2012-11-17, 01:57
At least based on prior threads (discussing experiences/issues with ec2),
there should be a number of people on this list who can help you. It would be helpful if we have a ec2-operations wiki @ https://cwiki.apache.org/confluence/display/KAFKA/Index . Would people be interested in sharing operational experiences there? I'm thinking something similar to the operations wiki that's already available ( https://cwiki.apache.org/confluence/display/KAFKA/Operations). Joel On Fri, Nov 16, 2012 at 5:04 PM, Senthilvel Rangaswamy <[EMAIL PROTECTED] > wrote: > Have folks implemented large installations on Kafka on Amazon EC2. I am > looking for best practices. Like the kind of nodes, EBS vs Instance store > etc., > > -- > ..Senthil > > "If there's anything more important than my ego around, I want it > caught and shot now." > - Douglas Adams. >
-
Re: Kafka on EC2Bae, Jae Hyeon 2012-11-18, 23:26
I am running kafka on ec2 with m1.large instance. I think that large
number of low end servers will outperform small number of high end servers, but I am not sure I am correct. I assumed 12 m1.large instances would be able to handle more than 6 billion rows in a day but my expectation was wrong. Single m1.large instance's capacity was 10k/sec. If we want to have a room, we'd better assume kafka on m1.large maximum capacity is 8k/sec. On Fri, Nov 16, 2012 at 5:57 PM, Joel Koshy <[EMAIL PROTECTED]> wrote: > At least based on prior threads (discussing experiences/issues with ec2), > there should be a number of people on this list who can help you. It would > be helpful if we have a ec2-operations wiki @ > https://cwiki.apache.org/confluence/display/KAFKA/Index . Would people be > interested in sharing operational experiences there? I'm thinking something > similar to the operations wiki that's already available ( > https://cwiki.apache.org/confluence/display/KAFKA/Operations). > > Joel > > > On Fri, Nov 16, 2012 at 5:04 PM, Senthilvel Rangaswamy <[EMAIL PROTECTED] >> wrote: > >> Have folks implemented large installations on Kafka on Amazon EC2. I am >> looking for best practices. Like the kind of nodes, EBS vs Instance store >> etc., >> >> -- >> ..Senthil >> >> "If there's anything more important than my ego around, I want it >> caught and shot now." >> - Douglas Adams. >>
-
Re: Kafka on EC2Neha Narkhede 2012-11-18, 23:36
>> Single m1.large instance's capacity was 10k/sec.
When you say capacity, did you mean the I/O or network capacity on the m1.large instances ? Thanks, Neha On Sun, Nov 18, 2012 at 3:26 PM, Bae, Jae Hyeon <[EMAIL PROTECTED]> wrote: > I am running kafka on ec2 with m1.large instance. I think that large > number of low end servers will outperform small number of high end > servers, but I am not sure I am correct. > > I assumed 12 m1.large instances would be able to handle more than 6 > billion rows in a day but my expectation was wrong. Single m1.large > instance's capacity was 10k/sec. If we want to have a room, we'd > better assume kafka on m1.large maximum capacity is 8k/sec. > > On Fri, Nov 16, 2012 at 5:57 PM, Joel Koshy <[EMAIL PROTECTED]> wrote: >> At least based on prior threads (discussing experiences/issues with ec2), >> there should be a number of people on this list who can help you. It would >> be helpful if we have a ec2-operations wiki @ >> https://cwiki.apache.org/confluence/display/KAFKA/Index . Would people be >> interested in sharing operational experiences there? I'm thinking something >> similar to the operations wiki that's already available ( >> https://cwiki.apache.org/confluence/display/KAFKA/Operations). >> >> Joel >> >> >> On Fri, Nov 16, 2012 at 5:04 PM, Senthilvel Rangaswamy <[EMAIL PROTECTED] >>> wrote: >> >>> Have folks implemented large installations on Kafka on Amazon EC2. I am >>> looking for best practices. Like the kind of nodes, EBS vs Instance store >>> etc., >>> >>> -- >>> ..Senthil >>> >>> "If there's anything more important than my ego around, I want it >>> caught and shot now." >>> - Douglas Adams. >>>
-
Re: Kafka on EC2David Arthur 2012-11-19, 19:30
I'd only consider m1.xlarge and higher for Kafka. The m1.xlarge have "high" I/O performance according to Amazon. This is disk I/O and network I/O performance. Of course you need to use EBS volumes if you want your Kafka brokers to survive reboots - you can expect reboots on AWS. Some people have reported I/O improvements by RAIDing EBS volumes (http://alestic.com/2009/06/ec2-ebs-raid). Deploying in the same region as your application will also improve performance.
On Nov 16, 2012, at 8:04 PM, Senthilvel Rangaswamy wrote: > Have folks implemented large installations on Kafka on Amazon EC2. I am > looking for best practices. Like the kind of nodes, EBS vs Instance store > etc., > > -- > ..Senthil > > "If there's anything more important than my ego around, I want it > caught and shot now." > - Douglas Adams.
-
Re: Kafka on EC2Matthew Rathbone 2012-11-19, 19:44
We RAID-0 4 EBS disks, we find that to be most performant, although it does
leave you more vulnerable to EBS network errors and outages. Ideally, if you're spread across AZ's you could do some clever routing to geographically local brokers, whilst keeping the others as a backup in case of failure. In the same vein, we have N+1 brokers, where N is how many we reasonably think we need, this way we can hopefully survive outages. On Mon, Nov 19, 2012 at 1:30 PM, David Arthur <[EMAIL PROTECTED]> wrote: > I'd only consider m1.xlarge and higher for Kafka. The m1.xlarge have > "high" I/O performance according to Amazon. This is disk I/O and network > I/O performance. Of course you need to use EBS volumes if you want your > Kafka brokers to survive reboots - you can expect reboots on AWS. Some > people have reported I/O improvements by RAIDing EBS volumes ( > http://alestic.com/2009/06/ec2-ebs-raid). Deploying in the same region as > your application will also improve performance. > > > On Nov 16, 2012, at 8:04 PM, Senthilvel Rangaswamy wrote: > > > Have folks implemented large installations on Kafka on Amazon EC2. I am > > looking for best practices. Like the kind of nodes, EBS vs Instance store > > etc., > > > > -- > > ..Senthil > > > > "If there's anything more important than my ego around, I want it > > caught and shot now." > > - Douglas Adams. > > -- Matthew Rathbone Foursquare | Software Engineer | Server Engineering Team [EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> | 4sq<http://foursquare.com/rathboma>
-
Re: Kafka on EC2James A. Robinson 2012-11-19, 19:56
On Mon, Nov 19, 2012 at 11:44 AM, Matthew Rathbone
<[EMAIL PROTECTED]> wrote: > In the same vein, we have N+1 brokers, where N is how many we > reasonably think we need, this way we can hopefully survive outages. So you're using N+1 live brokers, meaning you may then lose some data if one of the brokers goes down (any data from the broker that has yet to be consumed), but at least your overall service will keep running? That's the model we're thinking of using as well, at first I was thinking about setting up mirroring but reviewing the notes appeared to me to indicate that you've got the same vulnerability with mirroring (where MirrorMaker might not have read all the data). Jim - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - James A. Robinson [EMAIL PROTECTED] Stanford University HighWire Press http://highwire.stanford.edu/ +1 650 7237294 (Work) +1 650 7259335 (Fax)
-
Re: Kafka on EC2Senthilvel Rangaswamy 2012-11-19, 20:17
Ephemeral store is intact during reboots. Of course if your instance dies,
you lose the data on ephemeral store. Though you get some boost using EBS in RAID-0 config, it is still inferior to the performance you get on an ephemeral store. It'll be interesting to try out those Provisioned IOPS. On Mon, Nov 19, 2012 at 11:30 AM, David Arthur <[EMAIL PROTECTED]> wrote: > I'd only consider m1.xlarge and higher for Kafka. The m1.xlarge have > "high" I/O performance according to Amazon. This is disk I/O and network > I/O performance. Of course you need to use EBS volumes if you want your > Kafka brokers to survive reboots - you can expect reboots on AWS. Some > people have reported I/O improvements by RAIDing EBS volumes ( > http://alestic.com/2009/06/ec2-ebs-raid). Deploying in the same region as > your application will also improve performance. > > > On Nov 16, 2012, at 8:04 PM, Senthilvel Rangaswamy wrote: > > > Have folks implemented large installations on Kafka on Amazon EC2. I am > > looking for best practices. Like the kind of nodes, EBS vs Instance store > > etc., > > > > -- > > ..Senthil > > > > "If there's anything more important than my ego around, I want it > > caught and shot now." > > - Douglas Adams. > > -- ..Senthil "If there's anything more important than my ego around, I want it caught and shot now." - Douglas Adams.
-
Re: Kafka on EC2Matt Jones 2012-11-19, 21:34
We have pretty much the same setup as this, running on m1.large instances.
On Mon, Nov 19, 2012 at 11:44 AM, Matthew Rathbone <[EMAIL PROTECTED]>wrote: > We RAID-0 4 EBS disks, we find that to be most performant, although it does > leave you more vulnerable to EBS network errors and outages. Ideally, if > you're spread across AZ's you could do some clever routing to > geographically local brokers, whilst keeping the others as a backup in case > of failure. > > In the same vein, we have N+1 brokers, where N is how many we reasonably > think we need, this way we can hopefully survive outages. > > > On Mon, Nov 19, 2012 at 1:30 PM, David Arthur <[EMAIL PROTECTED]> wrote: > > > I'd only consider m1.xlarge and higher for Kafka. The m1.xlarge have > > "high" I/O performance according to Amazon. This is disk I/O and network > > I/O performance. Of course you need to use EBS volumes if you want your > > Kafka brokers to survive reboots - you can expect reboots on AWS. Some > > people have reported I/O improvements by RAIDing EBS volumes ( > > http://alestic.com/2009/06/ec2-ebs-raid). Deploying in the same region > as > > your application will also improve performance. > > > > > > On Nov 16, 2012, at 8:04 PM, Senthilvel Rangaswamy wrote: > > > > > Have folks implemented large installations on Kafka on Amazon EC2. I am > > > looking for best practices. Like the kind of nodes, EBS vs Instance > store > > > etc., > > > > > > -- > > > ..Senthil > > > > > > "If there's anything more important than my ego around, I want it > > > caught and shot now." > > > - Douglas Adams. > > > > > > > -- > Matthew Rathbone > Foursquare | Software Engineer | Server Engineering Team > [EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> | > 4sq<http://foursquare.com/rathboma> >
-
Re: Kafka on EC2Matthew Rathbone 2012-11-19, 21:44
If you raid EBS volumes you're pretty much protected from data loss as an
added benefit. So yeah if a broker goes down, you don't have the data until you bring it back up, but you CAN bring it back up, even if the box dies. On Mon, Nov 19, 2012 at 3:34 PM, Matt Jones <[EMAIL PROTECTED]> wrote: > We have pretty much the same setup as this, running on m1.large instances. > > On Mon, Nov 19, 2012 at 11:44 AM, Matthew Rathbone > <[EMAIL PROTECTED]>wrote: > > > We RAID-0 4 EBS disks, we find that to be most performant, although it > does > > leave you more vulnerable to EBS network errors and outages. Ideally, if > > you're spread across AZ's you could do some clever routing to > > geographically local brokers, whilst keeping the others as a backup in > case > > of failure. > > > > In the same vein, we have N+1 brokers, where N is how many we reasonably > > think we need, this way we can hopefully survive outages. > > > > > > On Mon, Nov 19, 2012 at 1:30 PM, David Arthur <[EMAIL PROTECTED]> wrote: > > > > > I'd only consider m1.xlarge and higher for Kafka. The m1.xlarge have > > > "high" I/O performance according to Amazon. This is disk I/O and > network > > > I/O performance. Of course you need to use EBS volumes if you want your > > > Kafka brokers to survive reboots - you can expect reboots on AWS. Some > > > people have reported I/O improvements by RAIDing EBS volumes ( > > > http://alestic.com/2009/06/ec2-ebs-raid). Deploying in the same region > > as > > > your application will also improve performance. > > > > > > > > > On Nov 16, 2012, at 8:04 PM, Senthilvel Rangaswamy wrote: > > > > > > > Have folks implemented large installations on Kafka on Amazon EC2. I > am > > > > looking for best practices. Like the kind of nodes, EBS vs Instance > > store > > > > etc., > > > > > > > > -- > > > > ..Senthil > > > > > > > > "If there's anything more important than my ego around, I want it > > > > caught and shot now." > > > > - Douglas Adams. > > > > > > > > > > > > -- > > Matthew Rathbone > > Foursquare | Software Engineer | Server Engineering Team > > [EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> | > > 4sq<http://foursquare.com/rathboma> > > > -- Matthew Rathbone Foursquare | Software Engineer | Server Engineering Team [EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> | 4sq<http://foursquare.com/rathboma>
-
Re: Kafka on EC2Bae, Jae Hyeon 2012-11-20, 00:36
Yes, 12 m1.large instances couldn't handle more than 12k messages per
second in our environment. When the traffic goes up to 12k/sec, kafka clusters started to be throttling. I am not sure how much one m1.xlarge will outperform 2 m1.large instances because m1.xlarge is twice expensive than m1.large. I vote that two m1.large will be better than 1 m1.xlarge. Speaking of EBS volume, EBS writing performance is not good and it's expensive. I hope replication in 0.8 will save us. On Sun, Nov 18, 2012 at 3:36 PM, Neha Narkhede <[EMAIL PROTECTED]> wrote: >>> Single m1.large instance's capacity was 10k/sec. > > When you say capacity, did you mean the I/O or network capacity on the > m1.large instances ? > > Thanks, > Neha > > > On Sun, Nov 18, 2012 at 3:26 PM, Bae, Jae Hyeon <[EMAIL PROTECTED]> wrote: >> I am running kafka on ec2 with m1.large instance. I think that large >> number of low end servers will outperform small number of high end >> servers, but I am not sure I am correct. >> >> I assumed 12 m1.large instances would be able to handle more than 6 >> billion rows in a day but my expectation was wrong. Single m1.large >> instance's capacity was 10k/sec. If we want to have a room, we'd >> better assume kafka on m1.large maximum capacity is 8k/sec. >> >> On Fri, Nov 16, 2012 at 5:57 PM, Joel Koshy <[EMAIL PROTECTED]> wrote: >>> At least based on prior threads (discussing experiences/issues with ec2), >>> there should be a number of people on this list who can help you. It would >>> be helpful if we have a ec2-operations wiki @ >>> https://cwiki.apache.org/confluence/display/KAFKA/Index . Would people be >>> interested in sharing operational experiences there? I'm thinking something >>> similar to the operations wiki that's already available ( >>> https://cwiki.apache.org/confluence/display/KAFKA/Operations). >>> >>> Joel >>> >>> >>> On Fri, Nov 16, 2012 at 5:04 PM, Senthilvel Rangaswamy <[EMAIL PROTECTED] >>>> wrote: >>> >>>> Have folks implemented large installations on Kafka on Amazon EC2. I am >>>> looking for best practices. Like the kind of nodes, EBS vs Instance store >>>> etc., >>>> >>>> -- >>>> ..Senthil >>>> >>>> "If there's anything more important than my ego around, I want it >>>> caught and shot now." >>>> - Douglas Adams. >>>>
-
Re: Kafka on EC2David Arthur 2012-11-20, 15:52
In my experience, anything smaller than m1.xlarge isn't really suitable for I/O intensive high performance stuff. I would guess that, for Kafka, a single m1.xlarge would outperform two m1.large. I have no hard evidence to support this however.
What I'd like to see are some benchmarks comparing 12 m1.large to 6 m1.xlarge to 1 hi1.4xlarge. Another interesting note is with the m1.xlarge you can get "optimized" EBS instances with a claimed 1000 Mbps I/O throughput. On Nov 19, 2012, at 7:36 PM, Bae, Jae Hyeon wrote: > Yes, 12 m1.large instances couldn't handle more than 12k messages per > second in our environment. When the traffic goes up to 12k/sec, kafka > clusters started to be throttling. I am not sure how much one > m1.xlarge will outperform 2 m1.large instances because m1.xlarge is > twice expensive than m1.large. I vote that two m1.large will be better > than 1 m1.xlarge. > > Speaking of EBS volume, EBS writing performance is not good and it's > expensive. I hope replication in 0.8 will save us. > > On Sun, Nov 18, 2012 at 3:36 PM, Neha Narkhede <[EMAIL PROTECTED]> wrote: >>>> Single m1.large instance's capacity was 10k/sec. >> >> When you say capacity, did you mean the I/O or network capacity on the >> m1.large instances ? >> >> Thanks, >> Neha >> >> >> On Sun, Nov 18, 2012 at 3:26 PM, Bae, Jae Hyeon <[EMAIL PROTECTED]> wrote: >>> I am running kafka on ec2 with m1.large instance. I think that large >>> number of low end servers will outperform small number of high end >>> servers, but I am not sure I am correct. >>> >>> I assumed 12 m1.large instances would be able to handle more than 6 >>> billion rows in a day but my expectation was wrong. Single m1.large >>> instance's capacity was 10k/sec. If we want to have a room, we'd >>> better assume kafka on m1.large maximum capacity is 8k/sec. >>> >>> On Fri, Nov 16, 2012 at 5:57 PM, Joel Koshy <[EMAIL PROTECTED]> wrote: >>>> At least based on prior threads (discussing experiences/issues with ec2), >>>> there should be a number of people on this list who can help you. It would >>>> be helpful if we have a ec2-operations wiki @ >>>> https://cwiki.apache.org/confluence/display/KAFKA/Index . Would people be >>>> interested in sharing operational experiences there? I'm thinking something >>>> similar to the operations wiki that's already available ( >>>> https://cwiki.apache.org/confluence/display/KAFKA/Operations). >>>> >>>> Joel >>>> >>>> >>>> On Fri, Nov 16, 2012 at 5:04 PM, Senthilvel Rangaswamy <[EMAIL PROTECTED] >>>>> wrote: >>>> >>>>> Have folks implemented large installations on Kafka on Amazon EC2. I am >>>>> looking for best practices. Like the kind of nodes, EBS vs Instance store >>>>> etc., >>>>> >>>>> -- >>>>> ..Senthil >>>>> >>>>> "If there's anything more important than my ego around, I want it >>>>> caught and shot now." >>>>> - Douglas Adams. >>>>>
-
Re: Kafka on EC2Evan Chan 2012-11-20, 17:12
We use m1.large's with ephemeral storage and get 20MB/sec using Kafka's
built in benchmarking tool. No compression. On Tue, Nov 20, 2012 at 7:52 AM, David Arthur <[EMAIL PROTECTED]> wrote: > In my experience, anything smaller than m1.xlarge isn't really suitable > for I/O intensive high performance stuff. I would guess that, for Kafka, a > single m1.xlarge would outperform two m1.large. I have no hard evidence to > support this however. > > What I'd like to see are some benchmarks comparing 12 m1.large to 6 > m1.xlarge to 1 hi1.4xlarge. > > Another interesting note is with the m1.xlarge you can get "optimized" EBS > instances with a claimed 1000 Mbps I/O throughput. > > On Nov 19, 2012, at 7:36 PM, Bae, Jae Hyeon wrote: > > > Yes, 12 m1.large instances couldn't handle more than 12k messages per > > second in our environment. When the traffic goes up to 12k/sec, kafka > > clusters started to be throttling. I am not sure how much one > > m1.xlarge will outperform 2 m1.large instances because m1.xlarge is > > twice expensive than m1.large. I vote that two m1.large will be better > > than 1 m1.xlarge. > > > > Speaking of EBS volume, EBS writing performance is not good and it's > > expensive. I hope replication in 0.8 will save us. > > > > On Sun, Nov 18, 2012 at 3:36 PM, Neha Narkhede <[EMAIL PROTECTED]> > wrote: > >>>> Single m1.large instance's capacity was 10k/sec. > >> > >> When you say capacity, did you mean the I/O or network capacity on the > >> m1.large instances ? > >> > >> Thanks, > >> Neha > >> > >> > >> On Sun, Nov 18, 2012 at 3:26 PM, Bae, Jae Hyeon <[EMAIL PROTECTED]> > wrote: > >>> I am running kafka on ec2 with m1.large instance. I think that large > >>> number of low end servers will outperform small number of high end > >>> servers, but I am not sure I am correct. > >>> > >>> I assumed 12 m1.large instances would be able to handle more than 6 > >>> billion rows in a day but my expectation was wrong. Single m1.large > >>> instance's capacity was 10k/sec. If we want to have a room, we'd > >>> better assume kafka on m1.large maximum capacity is 8k/sec. > >>> > >>> On Fri, Nov 16, 2012 at 5:57 PM, Joel Koshy <[EMAIL PROTECTED]> > wrote: > >>>> At least based on prior threads (discussing experiences/issues with > ec2), > >>>> there should be a number of people on this list who can help you. It > would > >>>> be helpful if we have a ec2-operations wiki @ > >>>> https://cwiki.apache.org/confluence/display/KAFKA/Index . Would > people be > >>>> interested in sharing operational experiences there? I'm thinking > something > >>>> similar to the operations wiki that's already available ( > >>>> https://cwiki.apache.org/confluence/display/KAFKA/Operations). > >>>> > >>>> Joel > >>>> > >>>> > >>>> On Fri, Nov 16, 2012 at 5:04 PM, Senthilvel Rangaswamy < > [EMAIL PROTECTED] > >>>>> wrote: > >>>> > >>>>> Have folks implemented large installations on Kafka on Amazon EC2. I > am > >>>>> looking for best practices. Like the kind of nodes, EBS vs Instance > store > >>>>> etc., > >>>>> > >>>>> -- > >>>>> ..Senthil > >>>>> > >>>>> "If there's anything more important than my ego around, I want it > >>>>> caught and shot now." > >>>>> - Douglas Adams. > >>>>> > > -- -- *Evan Chan* Senior Software Engineer | [EMAIL PROTECTED] | (650) 996-4600 www.ooyala.com | blog <http://www.ooyala.com/blog> | @ooyala<http://www.twitter.com/ooyala> |