Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Cloudera BASE (+ZooKeeper), Hadoop HDFS, MapReduce, EC2 instances selection


Copy link to this message
-
Re: Cloudera BASE (+ZooKeeper), Hadoop HDFS, MapReduce, EC2 instances selection
Running on EC2 has been discussed on the list quite a bit in the past, so
you might want to do some searches on the archives.  Here are a few threads
I pulled up:

http://search-hadoop.com/m/paQmKTxSgj

http://search-hadoop.com/m/7E9PaA6U1V

http://search-hadoop.com/m/sGXTATdlIg2

For instance types, it appears that only c1.xlarge, m2.4xlarge and
cc1.xlarge instances will get you a physical server for each instance, so
you will pay the least IO virtualization "tax" using these with instance
storage.  But even with that expect reduced IO performance vs physical
hardware.

For the node layout, I'd suggest something like:

1 - NameNode, JobTracker, ZooKeeper, HMaster
1 - SecondaryNameNode, HMaster
3 - DataNode, TaskTracker, RegionServer

You could run more ZK instances on smaller instance types (m1.medium?), but
beware that these could be more subject to erratic IO throughput due to
other instances running on the same physical server, which could negatively
impact zookeeper performance and overall cluster stability.  So for a
cluster this small, I don't think I would bother.

For instance types, it'll depend on your workload and memory requirements.
I usually use c1.xlarge for HBase testing, but those have somewhat limited
memory, so you'll be constrained on the number of MR tasks you can run
without overcommitting memory (you want to avoid swapping at all costs).

I would say to do some testing with your workload and see what instance
types give you the best performance at an acceptable price.

--gh
On Thu, Sep 15, 2011 at 2:01 AM, Ronen Itkin <[EMAIL PROTECTED]> wrote:

>  Hi,
>
> I am wondering if someone can recommend on the best practice with selecting
> the right AMAZON EC2 instances combination for the following
> implementation:
>
> Cloudera Hadoop HDFS and MapReduce:
>
>   - 1 NameNode + JobTracker servers.
>   - 1 SecondaryNameNode server.
>   - 3 DataNodes + TastTrackers.
>
>
> Cloudera HBase:
>
>   - 2 HMaster servers
>   - 3 ZooKeeper Servers
>   - 2 Region Servers.
>
>
> From your own experience what AMAZON EC2 instances should I choose?
> How would you combine and place the above implementation across the
> instances?
> Should I place datanode & task tracker with HRegionServer on the same
> instance?
>
> Thanks !
>
> --
> *
> Ronen.*
>
> <http://www.taykey.com/>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB