|
Jim R. Wilson
2011-06-04, 18:27
Sean Bigdatafun
2011-06-04, 18:40
Jim R. Wilson
2011-06-04, 18:49
Himanshu Vashishtha
2011-06-04, 19:02
Himanshu Vashishtha
2011-06-04, 19:16
Jim R. Wilson
2011-06-04, 19:25
Andrew Purtell
2011-06-04, 20:30
Jim R. Wilson
2011-06-05, 01:48
Dave Viner
2011-06-05, 05:01
George P. Stathis
2011-06-09, 01:21
Gaurav Kohli
2011-06-09, 05:59
Himanshu Vashishtha
2011-06-23, 21:33
|
-
Best practices for HBase in EC2?Jim R. Wilson 2011-06-04, 18:27
Hi HBase community,
What are the current best-practices with respect to starting up an HBase cluster in EC2? I don't see any public AMI's newer than 0.89.xxx, and starting up that one it's, clear that it's not configured for HDFS or clustering (empty hbase-site.xml). Do people generally keep data in S3 or HDFS? If the latter, is it persisted via EBS? Do the hadoop nodes have more than one EBS attached to distinguish HDFS from the OS? Any help is much appreciated. Thanks in advance! -- Jim R. Wilson (jimbojw)
-
Re: Best practices for HBase in EC2?Sean Bigdatafun 2011-06-04, 18:40
Here is my thoughts:
If your datastorage is used for long-term, then you may consider attaching HDFS storage device onto EBS rather than local disk (Attaching Namenode storage device onto EBS as well). But for this setup, I think we should think of dfs.replication.factor=2 (even 1) because EBS itself has already provided enough reliability. If your datastore is used for ephemeral purpose (say EMR computation), you may consider just using the EC2 provided ephemeral disks. On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson <[EMAIL PROTECTED]>wrote: > Hi HBase community, > > What are the current best-practices with respect to starting up an HBase > cluster in EC2? I don't see any public AMI's newer than 0.89.xxx, and > starting up that one it's, clear that it's not configured for HDFS or > clustering (empty hbase-site.xml). > > Do people generally keep data in S3 or HDFS? If the latter, is it > persisted > via EBS? Do the hadoop nodes have more than one EBS attached to > distinguish > HDFS from the OS? > > Any help is much appreciated. Thanks in advance! > > -- Jim R. Wilson (jimbojw) > -- --Sean
-
Re: Best practices for HBase in EC2?Jim R. Wilson 2011-06-04, 18:49
Thanks Sean,
That's helpful. I probably should have added some contextual info. In my case, I'm interested in providing instructions on how one can fire up an HBase cluster in EC2 order to experiment with it. That is, load data, practice administration, etc. In that context, it's unlikely that the person following the instructions would start more that 5 nodes, and would also not likely keep them on longer than an hour. I saw archived email threads where people recommended not running on EC2 for any length of time since you can get better performance-per-cost characteristics from dedicated hardware (for example from Rackspace). So I guess my real question is this: What is the easiest possible way to start a 5-node HBase 0.90.x cluster in EC2? I'm thinking that S3 is better for storage, but I'm open to whatever is genuinely the easiest thing to do. Thanks again, -- Jim On Sat, Jun 4, 2011 at 2:40 PM, Sean Bigdatafun <[EMAIL PROTECTED]>wrote: > Here is my thoughts: > > If your datastorage is used for long-term, then you may consider attaching > HDFS storage device onto EBS rather than local disk (Attaching Namenode > storage device onto EBS as well). But for this setup, I think we should > think of dfs.replication.factor=2 (even 1) because EBS itself has already > provided enough reliability. > > If your datastore is used for ephemeral purpose (say EMR computation), you > may consider just using the EC2 provided ephemeral disks. > > > > > On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson <[EMAIL PROTECTED] > >wrote: > > > Hi HBase community, > > > > What are the current best-practices with respect to starting up an HBase > > cluster in EC2? I don't see any public AMI's newer than 0.89.xxx, and > > starting up that one it's, clear that it's not configured for HDFS or > > clustering (empty hbase-site.xml). > > > > Do people generally keep data in S3 or HDFS? If the latter, is it > > persisted > > via EBS? Do the hadoop nodes have more than one EBS attached to > > distinguish > > HDFS from the OS? > > > > Any help is much appreciated. Thanks in advance! > > > > -- Jim R. Wilson (jimbojw) > > > > > > -- > --Sean >
-
Re: Best practices for HBase in EC2?Himanshu Vashishtha 2011-06-04, 19:02
I used ec2, but just for experiments. Here is what I did:
a) used the ephemeral disks. My experiment datasets were persisted on S3, and I copied them onto the cluster. b) Use the hbase-ec2 scripts. get this repo https://github.com/ekoontz/hbase-ec2.git. c) Consult Andrew's pdf: hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf For the AMI, there is a create-hbase-image script in the above git repo. I did create for my stuff and it's public, search "himanshu-hbase" and you should get it. But it's always good to have your own AMI (I learned it the hard way). Consult the run scripts, like bin/launch-hbase-cluster, bin/launch-hbase-master etc. One thing was when you run the launch-cluster, the cluster is all set but I needed to manually add the regionserver's internal ip in the master's conf/regionserver list. And also the datanode's entry in the conf/slaves if hadoop directory. This can be done by a script though. Hope this helps. Himanshu On Sat, Jun 4, 2011 at 12:49 PM, Jim R. Wilson <[EMAIL PROTECTED]>wrote: Thanks Sean, > > That's helpful. I probably should have added some contextual info. In my > case, I'm interested in providing instructions on how one can fire up an > HBase cluster in EC2 order to experiment with it. That is, load data, > practice administration, etc. In that context, it's unlikely that the > person following the instructions would start more that 5 nodes, and would > also not likely keep them on longer than an hour. > > I saw archived email threads where people recommended not running on EC2 > for > any length of time since you can get better performance-per-cost > characteristics from dedicated hardware (for example from Rackspace). > > So I guess my real question is this: What is the easiest possible way to > start a 5-node HBase 0.90.x cluster in EC2? I'm thinking that S3 is better > for storage, but I'm open to whatever is genuinely the easiest thing to do. > > Thanks again, > > -- Jim > > On Sat, Jun 4, 2011 at 2:40 PM, Sean Bigdatafun > <[EMAIL PROTECTED]>wrote: > > > Here is my thoughts: > > > > If your datastorage is used for long-term, then you may consider > attaching > > HDFS storage device onto EBS rather than local disk (Attaching Namenode > > storage device onto EBS as well). But for this setup, I think we should > > think of dfs.replication.factor=2 (even 1) because EBS itself has already > > provided enough reliability. > > > > If your datastore is used for ephemeral purpose (say EMR computation), > you > > may consider just using the EC2 provided ephemeral disks. > > > > > > > > > > On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson <[EMAIL PROTECTED] > > >wrote: > > > > > Hi HBase community, > > > > > > What are the current best-practices with respect to starting up an > HBase > > > cluster in EC2? I don't see any public AMI's newer than 0.89.xxx, and > > > starting up that one it's, clear that it's not configured for HDFS or > > > clustering (empty hbase-site.xml). > > > > > > Do people generally keep data in S3 or HDFS? If the latter, is it > > > persisted > > > via EBS? Do the hadoop nodes have more than one EBS attached to > > > distinguish > > > HDFS from the OS? > > > > > > Any help is much appreciated. Thanks in advance! > > > > > > -- Jim R. Wilson (jimbojw) > > > > > > > > > > > -- > > --Sean > > >
-
Re: Best practices for HBase in EC2?Himanshu Vashishtha 2011-06-04, 19:16
should add the disclaimer: That this is not the best possible way! :))
There are some ruby scripts too (in the same repo, look for recipes directory), and your cluster is up and running just with 1 rb file. I didn't use it because ruby is an unknown territory for me and I was not entirely clear about it's working. Himanshu On Sat, Jun 4, 2011 at 1:02 PM, Himanshu Vashishtha <[EMAIL PROTECTED] > wrote: > I used ec2, but just for experiments. Here is what I did: > a) used the ephemeral disks. My experiment datasets were persisted on S3, > and I copied them onto the cluster. > b) Use the hbase-ec2 scripts. get this repo > https://github.com/ekoontz/hbase-ec2.git. > c) Consult Andrew's pdf: hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf > > For the AMI, there is a create-hbase-image script in the above git repo. I > did create for my stuff and it's public, search "himanshu-hbase" and you > should get it. But it's always good to have your own AMI (I learned it the > hard way). > > Consult the run scripts, like bin/launch-hbase-cluster, > bin/launch-hbase-master etc. > One thing was when you run the launch-cluster, the cluster is all set but I > needed to manually add the regionserver's internal ip in the master's > conf/regionserver list. And also the datanode's entry in the conf/slaves if > hadoop directory. This can be done by a script though. > > Hope this helps. > Himanshu > > > On Sat, Jun 4, 2011 at 12:49 PM, Jim R. Wilson <[EMAIL PROTECTED]>wrote: > > Thanks Sean, >> >> That's helpful. I probably should have added some contextual info. In my >> case, I'm interested in providing instructions on how one can fire up an >> HBase cluster in EC2 order to experiment with it. That is, load data, >> practice administration, etc. In that context, it's unlikely that the >> person following the instructions would start more that 5 nodes, and would >> also not likely keep them on longer than an hour. >> >> I saw archived email threads where people recommended not running on EC2 >> for >> any length of time since you can get better performance-per-cost >> characteristics from dedicated hardware (for example from Rackspace). >> >> So I guess my real question is this: What is the easiest possible way to >> start a 5-node HBase 0.90.x cluster in EC2? I'm thinking that S3 is >> better >> for storage, but I'm open to whatever is genuinely the easiest thing to >> do. >> >> Thanks again, >> >> -- Jim >> >> On Sat, Jun 4, 2011 at 2:40 PM, Sean Bigdatafun >> <[EMAIL PROTECTED]>wrote: >> >> > Here is my thoughts: >> > >> > If your datastorage is used for long-term, then you may consider >> attaching >> > HDFS storage device onto EBS rather than local disk (Attaching Namenode >> > storage device onto EBS as well). But for this setup, I think we should >> > think of dfs.replication.factor=2 (even 1) because EBS itself has >> already >> > provided enough reliability. >> > >> > If your datastore is used for ephemeral purpose (say EMR computation), >> you >> > may consider just using the EC2 provided ephemeral disks. >> > >> > >> > >> > >> > On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson <[EMAIL PROTECTED] >> > >wrote: >> > >> > > Hi HBase community, >> > > >> > > What are the current best-practices with respect to starting up an >> HBase >> > > cluster in EC2? I don't see any public AMI's newer than 0.89.xxx, and >> > > starting up that one it's, clear that it's not configured for HDFS or >> > > clustering (empty hbase-site.xml). >> > > >> > > Do people generally keep data in S3 or HDFS? If the latter, is it >> > > persisted >> > > via EBS? Do the hadoop nodes have more than one EBS attached to >> > > distinguish >> > > HDFS from the OS? >> > > >> > > Any help is much appreciated. Thanks in advance! >> > > >> > > -- Jim R. Wilson (jimbojw) >> > > >> > >> > >> > >> > -- >> > --Sean >> > >> > >
-
Re: Best practices for HBase in EC2?Jim R. Wilson 2011-06-04, 19:25
Thanks Himanshu,
Sounds like I'll need to make my own AMI's :/ It's been a really long time since I've rolled HBase AMI's - last time I did it though, one of the reasons was so I wouldn't have to deal with manual IP configs. I'll see if my AMI's can be flexible enough to join a cluster through startup data alone. -- Jim On Sat, Jun 4, 2011 at 3:16 PM, Himanshu Vashishtha <[EMAIL PROTECTED] > wrote: > should add the disclaimer: That this is not the best possible way! :)) > There are some ruby scripts too (in the same repo, look for recipes > directory), and your cluster is up and running just with 1 rb file. I > didn't > use it because ruby is an unknown territory for me and I was not entirely > clear about it's working. > > Himanshu > > On Sat, Jun 4, 2011 at 1:02 PM, Himanshu Vashishtha < > [EMAIL PROTECTED] > > wrote: > > > I used ec2, but just for experiments. Here is what I did: > > a) used the ephemeral disks. My experiment datasets were persisted on S3, > > and I copied them onto the cluster. > > b) Use the hbase-ec2 scripts. get this repo > > https://github.com/ekoontz/hbase-ec2.git. > > c) Consult Andrew's pdf: hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf > > > > For the AMI, there is a create-hbase-image script in the above git repo. > I > > did create for my stuff and it's public, search "himanshu-hbase" and you > > should get it. But it's always good to have your own AMI (I learned it > the > > hard way). > > > > Consult the run scripts, like bin/launch-hbase-cluster, > > bin/launch-hbase-master etc. > > One thing was when you run the launch-cluster, the cluster is all set but > I > > needed to manually add the regionserver's internal ip in the master's > > conf/regionserver list. And also the datanode's entry in the conf/slaves > if > > hadoop directory. This can be done by a script though. > > > > Hope this helps. > > Himanshu > > > > > > On Sat, Jun 4, 2011 at 12:49 PM, Jim R. Wilson <[EMAIL PROTECTED] > >wrote: > > > > Thanks Sean, > >> > >> That's helpful. I probably should have added some contextual info. In > my > >> case, I'm interested in providing instructions on how one can fire up an > >> HBase cluster in EC2 order to experiment with it. That is, load data, > >> practice administration, etc. In that context, it's unlikely that the > >> person following the instructions would start more that 5 nodes, and > would > >> also not likely keep them on longer than an hour. > >> > >> I saw archived email threads where people recommended not running on EC2 > >> for > >> any length of time since you can get better performance-per-cost > >> characteristics from dedicated hardware (for example from Rackspace). > >> > >> So I guess my real question is this: What is the easiest possible way to > >> start a 5-node HBase 0.90.x cluster in EC2? I'm thinking that S3 is > >> better > >> for storage, but I'm open to whatever is genuinely the easiest thing to > >> do. > >> > >> Thanks again, > >> > >> -- Jim > >> > >> On Sat, Jun 4, 2011 at 2:40 PM, Sean Bigdatafun > >> <[EMAIL PROTECTED]>wrote: > >> > >> > Here is my thoughts: > >> > > >> > If your datastorage is used for long-term, then you may consider > >> attaching > >> > HDFS storage device onto EBS rather than local disk (Attaching > Namenode > >> > storage device onto EBS as well). But for this setup, I think we > should > >> > think of dfs.replication.factor=2 (even 1) because EBS itself has > >> already > >> > provided enough reliability. > >> > > >> > If your datastore is used for ephemeral purpose (say EMR computation), > >> you > >> > may consider just using the EC2 provided ephemeral disks. > >> > > >> > > >> > > >> > > >> > On Sat, Jun 4, 2011 at 11:27 AM, Jim R. Wilson < > [EMAIL PROTECTED] > >> > >wrote: > >> > > >> > > Hi HBase community, > >> > > > >> > > What are the current best-practices with respect to starting up an > >> HBase > >> > > cluster in EC2? I don't see any public AMI's newer than 0.89.xxx, > and
-
Re: Best practices for HBase in EC2?Andrew Purtell 2011-06-04, 20:30
I recommend you look at Whirr:
http://incubator.apache.org/whirr/ specifically: http://www.philwhln.com/run-the-latest-whirr-and-deploy-hbase-in-minutes - Andy
-
Re: Best practices for HBase in EC2?Jim R. Wilson 2011-06-05, 01:48
Thanks for this! I'm definitely going to give it a try - sounds like
exactly what I need. -- Jim On Sat, Jun 4, 2011 at 4:30 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > I recommend you look at Whirr: > http://incubator.apache.org/whirr/ > specifically: > > http://www.philwhln.com/run-the-latest-whirr-and-deploy-hbase-in-minutes > > - Andy >
-
Re: Best practices for HBase in EC2?Dave Viner 2011-06-05, 05:01
I believe Cloudera also offers an AMI that has hbase installed, do they not?
I'm not sure if it's 0.90.x - but it might work as well. On Sat, Jun 4, 2011 at 6:48 PM, Jim R. Wilson <[EMAIL PROTECTED]>wrote: > Thanks for this! I'm definitely going to give it a try - sounds like > exactly what I need. > > -- Jim > > On Sat, Jun 4, 2011 at 4:30 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > > I recommend you look at Whirr: > > http://incubator.apache.org/whirr/ > > specifically: > > > > http://www.philwhln.com/run-the-latest-whirr-and-deploy-hbase-in-minutes > > > > - Andy > > >
-
Re: Best practices for HBase in EC2?George P. Stathis 2011-06-09, 01:21
Jim, I'd be interested in hearing your experience with Whirr when you try
it. I've been testing it the last couple of days and I haven't been able to get the out-of-the box hadoop recipe to work when it cames up (the namenode doesn't have any datanodes configured although they are all up and running). Maybe you have better luck? I've tried the Whirr 0.3.0 distro that comes with CDH3 as well as the recent 0.5.0 tarball from the apache mirrors. I encountered issues with the recipes included in either one. The only thing I haven't tried yet is building from the latest source. -GS On Sat, Jun 4, 2011 at 9:48 PM, Jim R. Wilson <[EMAIL PROTECTED]>wrote: > Thanks for this! I'm definitely going to give it a try - sounds like > exactly what I need. > > -- Jim > > On Sat, Jun 4, 2011 at 4:30 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > > I recommend you look at Whirr: > > http://incubator.apache.org/whirr/ > > specifically: > > > > http://www.philwhln.com/run-the-latest-whirr-and-deploy-hbase-in-minutes > > > > - Andy > > >
-
RE: Best practices for HBase in EC2?Gaurav Kohli 2011-06-09, 05:59
Can anyone comment on the performance of "Cluster Compute Instances" of EC2 which they have released lately and do provide 10 Gigabit Ethernet which was the main issue with the previous instances. They have customized these instances for low latency inter-node communication We are plannin to start a project and want to evaluate if going production with Cluster Compute instances is fine or as suggested in the thread below Rackspace would surely be a better option ? - Gaurav ________________________________________ From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] on behalf of George P. Stathis [[EMAIL PROTECTED]] Sent: Thursday, June 09, 2011 3:21 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Best practices for HBase in EC2? Jim, I'd be interested in hearing your experience with Whirr when you try it. I've been testing it the last couple of days and I haven't been able to get the out-of-the box hadoop recipe to work when it cames up (the namenode doesn't have any datanodes configured although they are all up and running). Maybe you have better luck? I've tried the Whirr 0.3.0 distro that comes with CDH3 as well as the recent 0.5.0 tarball from the apache mirrors. I encountered issues with the recipes included in either one. The only thing I haven't tried yet is building from the latest source. -GS On Sat, Jun 4, 2011 at 9:48 PM, Jim R. Wilson <[EMAIL PROTECTED]>wrote: > Thanks for this! I'm definitely going to give it a try - sounds like > exactly what I need. > > -- Jim > > On Sat, Jun 4, 2011 at 4:30 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > > I recommend you look at Whirr: > > http://incubator.apache.org/whirr/ > > specifically: > > > > http://www.philwhln.com/run-the-latest-whirr-and-deploy-hbase-in-minutes > > > > - Andy > > >
-
Re: Best practices for HBase in EC2?Himanshu Vashishtha 2011-06-23, 21:33
Hey Wilson, I will be rerunning experiments up there on ec2 and interested
to know your experience about Whirr, in case you tried it. Interested in bash scripts vs Whirr thing for a scenario where all one need is to start a cluster, run some experiments and then terminate it. Running experiments may include changing cluster config like adding-removing regionservers. Thanks, Himanshu On Sat, Jun 4, 2011 at 7:48 PM, Jim R. Wilson <[EMAIL PROTECTED]>wrote: > Thanks for this! I'm definitely going to give it a try - sounds like > exactly what I need. > > -- Jim > > On Sat, Jun 4, 2011 at 4:30 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > > I recommend you look at Whirr: > > http://incubator.apache.org/whirr/ > > specifically: > > > > http://www.philwhln.com/run-the-latest-whirr-and-deploy-hbase-in-minutes > > > > - Andy > > > |