Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> choices for deploying a small hadoop cluster on EC2


Copy link to this message
-
Re: choices for deploying a small hadoop cluster on EC2
I'd suggest you use BigTop (cross-posting to bigtop-dev@ list) produced bit
which also posses Puppet recipes allowing for fully automated deployment and
configuration. BigTop also uses Jenkins EC2 plugin for deployment part and it
seems to work real great!

Cos

On Tue, Nov 29, 2011 at 12:28PM, Periya.Data wrote:
> Hi All,
>         I am just beginning to learn how to deploy a small cluster (a 3
> node cluster) on EC2. After some quick Googling, I see the following
> approaches:
>
>    1. Use Whirr for quick deployment and tearing down. Uses CDH3. Does it
>    have features for persisting (EBS)?
>    2. CDH Cloud Scripts - has EC2 AMI - again for temp Hadoop clusters/POC
>    etc. Good stuff - I can persist using EBS snapshots. But, this uses CDH2.
>    3. Install hadoop manually and related stuff like Hive...on each cluster
>    node...on EC2 (or use some automation tool like Chef). I do not prefer it.
>    4. Hadoop distribution comes with EC2 (under src/contrib) and there are
>    several Hadoop EC2 AMIs available. I have not studied enough to know if
>    that is easy for a beginner like me.
>    5. Anything else??
>
> 1 and 2 look promising as a beginner. If any of you have any thoughts about
> this, I would like to know (like what to keep in mind, what to take care
> of, caveats etc). I want my data /config to persist (using EBS) and
> continue from where I left off...(after a few days).  Also, I want to have
> HIVE and SQOOP installed. Can this done using 1 or 2? Or, will installation
> of them have to be done manually after I set up the cluster?
>
> Thanks very much,
>
> PD.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB