-Re: Cloudera BASE (+ZooKeeper), Hadoop HDFS, MapReduce, EC2 instances selection
Gary Helmling 2011-09-15, 19:34
Running on EC2 has been discussed on the list quite a bit in the past, so
you might want to do some searches on the archives. Here are a few threads
I pulled up:
For instance types, it appears that only c1.xlarge, m2.4xlarge and
cc1.xlarge instances will get you a physical server for each instance, so
you will pay the least IO virtualization "tax" using these with instance
storage. But even with that expect reduced IO performance vs physical
For the node layout, I'd suggest something like:
1 - NameNode, JobTracker, ZooKeeper, HMaster
1 - SecondaryNameNode, HMaster
3 - DataNode, TaskTracker, RegionServer
You could run more ZK instances on smaller instance types (m1.medium?), but
beware that these could be more subject to erratic IO throughput due to
other instances running on the same physical server, which could negatively
impact zookeeper performance and overall cluster stability. So for a
cluster this small, I don't think I would bother.
For instance types, it'll depend on your workload and memory requirements.
I usually use c1.xlarge for HBase testing, but those have somewhat limited
memory, so you'll be constrained on the number of MR tasks you can run
without overcommitting memory (you want to avoid swapping at all costs).
I would say to do some testing with your workload and see what instance
types give you the best performance at an acceptable price.
On Thu, Sep 15, 2011 at 2:01 AM, Ronen Itkin <[EMAIL PROTECTED]> wrote:
> I am wondering if someone can recommend on the best practice with selecting
> the right AMAZON EC2 instances combination for the following
> Cloudera Hadoop HDFS and MapReduce:
> - 1 NameNode + JobTracker servers.
> - 1 SecondaryNameNode server.
> - 3 DataNodes + TastTrackers.
> Cloudera HBase:
> - 2 HMaster servers
> - 3 ZooKeeper Servers
> - 2 Region Servers.
> From your own experience what AMAZON EC2 instances should I choose?
> How would you combine and place the above implementation across the
> Should I place datanode & task tracker with HRegionServer on the same
> Thanks !