|
Weishung Chung
2011-03-10, 17:12
Gary Helmling
2011-03-10, 17:37
Ted Dunning
2011-03-10, 17:38
Weishung Chung
2011-03-10, 21:55
Andrew Purtell
2011-03-10, 23:30
Peter Haidinyak
2011-03-10, 23:46
Andrew Purtell
2011-03-11, 01:18
Lars George
2011-03-11, 17:45
|
-
cost estimationWeishung Chung 2011-03-10, 17:12
I am trying to estimate the cost of hosting own HBase cluster vs using EC2.
Could anyone give me some guidance? Cluster size ~ 6 to 8 nodes Usage ~ at least 12 hours/day with lot of read/write operations. (I know I need to have more concrete usage number here) Thank you so much :)
-
Re: cost estimationGary Helmling 2011-03-10, 17:37
Hi Weishung,
See the EC2 instance pricing details here: http://aws.amazon.com/ec2/#pricing <http://aws.amazon.com/ec2/#pricing>and try to calculate it out vs. price quotes for hardware. You'll need to run at _least_ m1.large or c1.xlarge instances for HBase. There was a recent discussion thread covering EC2 performance. You can look it up at search-hadoop.com. If you don't need the cluster running 24x7, maybe you can make the EC2 pricing work out. Just be aware that you'll be taking a hit in raw IO performance per node, so you may need to balance that out with more nodes than you would need with using your own hardware. If you need to persist data between cluster restarts, you'll also need either EBS or S3 storage, so be sure to factor that in. Also factor in bandwidth costs if you need to transfer a lot of data in/out of AWS. My own impression is that EC2 is great and very cost effective for short lived, on-demand computing resources. We use it a great deal for functional testing. For 24x7 services, it seems like you pay a premium long term over owning your own hardware, with advantage of no large up-front cost for acquisition and access to easy elasticity to expand to meet demand, but with a cost of reduced performance per node due to virtualization. Best advice I can give is do some benchmarking to see how many nodes you need to satisfy your processing requirements in EC2 vs on raw hardware and try to comparatively price it out. --gh On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > I am trying to estimate the cost of hosting own HBase cluster vs using EC2. > Could anyone give me some guidance? > Cluster size ~ 6 to 8 nodes > Usage ~ at least 12 hours/day with lot of read/write operations. (I know I > need to have more concrete usage number here) > > Thank you so much :) >
-
Re: cost estimationTed Dunning 2011-03-10, 17:38
With no information whatsoever about size of the data, I would guess a cost
of about $4000 / node with annual hosting and power requirements about $2000/year. This is probably no more accurate than one order of magnitude. It has a decent chance of being on the close order of magnitude. In particular, you might want a lot of memory. It is unlikely you want a lot of disks. You can do the math yourself on the EC2 costs. On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > I am trying to estimate the cost of hosting own HBase cluster vs using EC2. > Could anyone give me some guidance? > Cluster size ~ 6 to 8 nodes > Usage ~ at least 12 hours/day with lot of read/write operations. (I know I > need to have more concrete usage number here) > > Thank you so much :) >
-
Re: cost estimationWeishung Chung 2011-03-10, 21:55
Thank you :)
I also found this cost calculator for EC2 http://calculator.s3.amazonaws.com/calc5.html <http://calculator.s3.amazonaws.com/calc5.html> On Thu, Mar 10, 2011 at 11:38 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > With no information whatsoever about size of the data, I would guess a cost > of about $4000 / node with annual hosting and power requirements about > $2000/year. > > This is probably no more accurate than one order of magnitude. It has a > decent chance of being on the close order of magnitude. In particular, you > might want > a lot of memory. It is unlikely you want a lot of disks. > > You can do the math yourself on the EC2 costs. > > On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <[EMAIL PROTECTED]>wrote: > >> I am trying to estimate the cost of hosting own HBase cluster vs using >> EC2. >> Could anyone give me some guidance? >> Cluster size ~ 6 to 8 nodes >> Usage ~ at least 12 hours/day with lot of read/write operations. (I know I >> need to have more concrete usage number here) >> >> Thank you so much :) >> > >
-
Re: cost estimationAndrew Purtell 2011-03-10, 23:30
Everything Gary said.
Something interesting Netflix said this week at the ccevent conference was they were able to depreciate Reserved Instance payments as a capital expenditure. Also, c1.xlarge is one of only three instance types that seem to get its own physical server for each instance (others are m2.4xlarge and cc1.xlarge iirc). > From: Gary Helmling <[EMAIL PROTECTED]> > Subject: Re: cost estimation > To: [EMAIL PROTECTED] > Date: Thursday, March 10, 2011, 9:37 AM > Hi Weishung, > > See the EC2 instance pricing details here: > http://aws.amazon.com/ec2/#pricing > > <http://aws.amazon.com/ec2/#pricing>and > try to calculate it out vs. price > quotes for hardware. > > You'll need to run at _least_ m1.large or c1.xlarge instances for HBase. > There was a recent discussion thread covering EC2 performance. You can > look it up at search-hadoop.com. > > If you don't need the cluster running 24x7, maybe you can make the EC2 > pricing work out. Just be aware that you'll be taking a hit in raw IO > performance per node, so you may need to balance that out with more nodes > than you would need with using your own hardware. If you need to persist > data between cluster restarts, you'll also need either EBS or S3 storage, so > be sure to factor that in. Also factor in bandwidth costs if you need to > transfer a lot of data in/out of AWS. > > My own impression is that EC2 is great and very cost effective for short > lived, on-demand computing resources. We use it a great deal for functional > testing. For 24x7 services, it seems like you pay a premium long term over > owning your own hardware, with advantage of no large up-front cost for > acquisition and access to easy elasticity to expand to meet demand, but with > a cost of reduced performance per node due to virtualization. > > Best advice I can give is do some benchmarking to see how many nodes you > need to satisfy your processing requirements in EC2 vs on raw hardware and > try to comparatively price it out. > > --gh > > On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > I am trying to estimate the cost of hosting own HBase > cluster vs using EC2. > > Could anyone give me some guidance? > > Cluster size ~ 6 to 8 nodes > > Usage ~ at least 12 hours/day with lot of read/write > operations. (I know I > > need to have more concrete usage number here) > > > > Thank you so much :) > > >
-
RE: cost estimationPeter Haidinyak 2011-03-10, 23:46
I just took a day course on the Amazon Cloud and he had mentioned the every time you spin up a VM it gets a different IP and Host name. If this is true how do you keep the configuration files current every time you add a new VM or power on an existing Cluster?
Thanks -Pete -----Original Message----- From: Andrew Purtell [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 10, 2011 3:31 PM To: [EMAIL PROTECTED] Subject: Re: cost estimation Everything Gary said. Something interesting Netflix said this week at the ccevent conference was they were able to depreciate Reserved Instance payments as a capital expenditure. Also, c1.xlarge is one of only three instance types that seem to get its own physical server for each instance (others are m2.4xlarge and cc1.xlarge iirc). > From: Gary Helmling <[EMAIL PROTECTED]> > Subject: Re: cost estimation > To: [EMAIL PROTECTED] > Date: Thursday, March 10, 2011, 9:37 AM > Hi Weishung, > > See the EC2 instance pricing details here: > http://aws.amazon.com/ec2/#pricing > > <http://aws.amazon.com/ec2/#pricing>and > try to calculate it out vs. price > quotes for hardware. > > You'll need to run at _least_ m1.large or c1.xlarge instances for HBase. > There was a recent discussion thread covering EC2 performance. You can > look it up at search-hadoop.com. > > If you don't need the cluster running 24x7, maybe you can make the EC2 > pricing work out. Just be aware that you'll be taking a hit in raw IO > performance per node, so you may need to balance that out with more nodes > than you would need with using your own hardware. If you need to persist > data between cluster restarts, you'll also need either EBS or S3 storage, so > be sure to factor that in. Also factor in bandwidth costs if you need to > transfer a lot of data in/out of AWS. > > My own impression is that EC2 is great and very cost effective for short > lived, on-demand computing resources. We use it a great deal for functional > testing. For 24x7 services, it seems like you pay a premium long term over > owning your own hardware, with advantage of no large up-front cost for > acquisition and access to easy elasticity to expand to meet demand, but with > a cost of reduced performance per node due to virtualization. > > Best advice I can give is do some benchmarking to see how many nodes you > need to satisfy your processing requirements in EC2 vs on raw hardware and > try to comparatively price it out. > > --gh > > On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > I am trying to estimate the cost of hosting own HBase > cluster vs using EC2. > > Could anyone give me some guidance? > > Cluster size ~ 6 to 8 nodes > > Usage ~ at least 12 hours/day with lot of read/write > operations. (I know I > > need to have more concrete usage number here) > > > > Thank you so much :) > > >
-
RE: cost estimationAndrew Purtell 2011-03-11, 01:18
Hi Peter,
We boot the master first, then boot the slaves after the master's IP address is known. Instances are initialized using user-data scripts. We do substitutions on config details when creating the user-data for the instances. So this is sufficient for transient/testing clusters. For a cluster that would run for a long time or need to be reliable, you'd want to have a plan for if the master instance goes away. I think what would be relatively easy to do is grab an elastic IP (which gives you a "well known" DNS name also), assign it to the current master, then use RedHat Cluster Suite or similar with another instance as a hot spare, with DRDB replication of the fsimage from primary to secondary. Then the script which handles loss of the primary can remap the elastic IP and start a namenode on the secondary. Best regards, - Andy > From: Peter Haidinyak <[EMAIL PROTECTED]> > Subject: RE: cost estimation > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: Thursday, March 10, 2011, 3:46 PM > I just took a day course on the > Amazon Cloud and he had mentioned the every time you spin up > a VM it gets a different IP and Host name. If this is true > how do you keep the configuration files current every time > you add a new VM or power on an existing Cluster? > > Thanks > > -Pete > > -----Original Message----- > From: Andrew Purtell [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, March 10, 2011 3:31 PM > To: [EMAIL PROTECTED] > Subject: Re: cost estimation > > Everything Gary said. > > Something interesting Netflix said this week at the ccevent > conference was they were able to depreciate Reserved > Instance payments as a capital expenditure. > > Also, c1.xlarge is one of only three instance types that > seem to get its own physical server for each instance > (others are m2.4xlarge and cc1.xlarge iirc). > > > From: Gary Helmling <[EMAIL PROTECTED]> > > Subject: Re: cost estimation > > To: [EMAIL PROTECTED] > > Date: Thursday, March 10, 2011, 9:37 AM > > Hi Weishung, > > > > See the EC2 instance pricing details here: > > http://aws.amazon.com/ec2/#pricing > > > > <http://aws.amazon.com/ec2/#pricing>and > > try to calculate it out vs. price > > quotes for hardware. > > > > You'll need to run at _least_ m1.large or c1.xlarge > instances for HBase. > > There was a recent discussion thread covering > EC2 performance. You can > > look it up at search-hadoop.com. > > > > If you don't need the cluster running 24x7, maybe you > can make the EC2 > > pricing work out. Just be aware that you'll be > taking a hit in raw IO > > performance per node, so you may need to balance that > out with more nodes > > than you would need with using your own hardware. If > you need to persist > > data between cluster restarts, you'll also need either > EBS or S3 storage, so > > be sure to factor that in. Also factor in bandwidth > costs if you need to > > transfer a lot of data in/out of AWS. > > > > My own impression is that EC2 is great and very cost > effective for short > > lived, on-demand computing resources. We use it a > great deal for functional > > testing. For 24x7 services, it seems like you pay a > premium long term over > > owning your own hardware, with advantage of no large > up-front cost for > > acquisition and access to easy elasticity to expand to > meet demand, but with > > a cost of reduced performance per node due to > virtualization. > > > > Best advice I can give is do some benchmarking to see > how many nodes you > > need to satisfy your processing requirements in EC2 vs > on raw hardware and > > try to comparatively price it out. > > > > --gh > > > > On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <[EMAIL PROTECTED]> > > wrote: > > > > > I am trying to estimate the cost of hosting own > HBase > > cluster vs using EC2. > > > Could anyone give me some guidance? > > > Cluster size ~ 6 to 8 nodes > > > Usage ~ at least 12 hours/day with lot of > read/write > > operations. (I know I
-
Re: cost estimationLars George 2011-03-11, 17:45
Hi,
That is an interesting question and I noticed the same: stopped instances (backed by EBS) get a new IP at start. Only restarts has the IP survive. Not sure how to handle this but add some extra scripts to patch the configs on start. Messy. Anyone with experience willing to chime in? Lars On Mar 11, 2011, at 0:46, Peter Haidinyak <[EMAIL PROTECTED]> wrote: > I just took a day course on the Amazon Cloud and he had mentioned the every time you spin up a VM it gets a different IP and Host name. If this is true how do you keep the configuration files current every time you add a new VM or power on an existing Cluster? > > Thanks > > -Pete > > -----Original Message----- > From: Andrew Purtell [mailto:[EMAIL PROTECTED]] > Sent: Thursday, March 10, 2011 3:31 PM > To: [EMAIL PROTECTED] > Subject: Re: cost estimation > > Everything Gary said. > > Something interesting Netflix said this week at the ccevent conference was they were able to depreciate Reserved Instance payments as a capital expenditure. > > Also, c1.xlarge is one of only three instance types that seem to get its own physical server for each instance (others are m2.4xlarge and cc1.xlarge iirc). > >> From: Gary Helmling <[EMAIL PROTECTED]> >> Subject: Re: cost estimation >> To: [EMAIL PROTECTED] >> Date: Thursday, March 10, 2011, 9:37 AM >> Hi Weishung, >> >> See the EC2 instance pricing details here: >> http://aws.amazon.com/ec2/#pricing >> >> <http://aws.amazon.com/ec2/#pricing>and >> try to calculate it out vs. price >> quotes for hardware. >> >> You'll need to run at _least_ m1.large or c1.xlarge instances for HBase. >> There was a recent discussion thread covering EC2 performance. You can >> look it up at search-hadoop.com. >> >> If you don't need the cluster running 24x7, maybe you can make the EC2 >> pricing work out. Just be aware that you'll be taking a hit in raw IO >> performance per node, so you may need to balance that out with more nodes >> than you would need with using your own hardware. If you need to persist >> data between cluster restarts, you'll also need either EBS or S3 storage, so >> be sure to factor that in. Also factor in bandwidth costs if you need to >> transfer a lot of data in/out of AWS. >> >> My own impression is that EC2 is great and very cost effective for short >> lived, on-demand computing resources. We use it a great deal for functional >> testing. For 24x7 services, it seems like you pay a premium long term over >> owning your own hardware, with advantage of no large up-front cost for >> acquisition and access to easy elasticity to expand to meet demand, but with >> a cost of reduced performance per node due to virtualization. >> >> Best advice I can give is do some benchmarking to see how many nodes you >> need to satisfy your processing requirements in EC2 vs on raw hardware and >> try to comparatively price it out. >> >> --gh >> >> On Thu, Mar 10, 2011 at 9:12 AM, Weishung Chung <[EMAIL PROTECTED]> >> wrote: >> >>> I am trying to estimate the cost of hosting own HBase >> cluster vs using EC2. >>> Could anyone give me some guidance? >>> Cluster size ~ 6 to 8 nodes >>> Usage ~ at least 12 hours/day with lot of read/write >> operations. (I know I >>> need to have more concrete usage number here) >>> >>> Thank you so much :) >>> >> > > > |