First, you have to explain what you mean by 'equivalent' .
The short answer is that it depends.
The longer answer is that you have to consider cost in your design.
The whole issue of design is to maintain the correct ratio of cores to memory and cores to spindles while optimizing the box within the cost, space and hardware (box configurations) limitations.
Note that you can sacrifice some of the ratio, however, you will leave some of the performance on the table.
On Jul 1, 2012, at 6:13 AM, Safdar Kureishy wrote:
> I have a reasonably simple question that I thought I'd post to this list
> because I don't have enough experience with hardware to figure this out
> Let's assume that I have 2 separate cluster setups for slave nodes. The
> master node is a separate machine *outside* these clusters:
> *Setup A*: 28 nodes, each with a 2-core CPU, 8 GB RAM and 1 SATA drives (1
> TB each)
> *Setup B*: 7 nodes, each with a 8-core CPU, 32 GB Ram and 4 SATA drives (1
> TB each)
> Note that I have maintained the same *core:memory:spindle* ratio above. In
> essence, setup B has the same overall processing + memory + spindle
> capacity, but achieved with 4 times fewer nodes.
> Ignoring the* cost* of each node above, and assuming a 10Gb Ethernet
> connectivity and the same speed-per-core across nodes in both the scenarios
> above, are Setup A and Setup B equivalent to each other in the context of
> setting up a Hadoop cluster? Or will the relative performance be different?
> Excluding the network connectivity between the nodes, what would be some
> other criteria that might give one setup an edge over the other, for
> regular Hadoop jobs?
> Also, assuming the same type of Hadoop jobs on both clusters, how different
> would the load experienced by the master node be for each setup above?
> Thanks in advance,