On 28/09/11 08:37, N Keywal wrote:
> For example:
> - It's adding two layers (windows& linux), that can both fail, especially
> under heavy workload (and hadoop is built to use all the resources
> available). They will need to be managed as well (software upgrades,
> hardware support...), it's an extra cost.
> - These two layers will use randomly the different resources (HDD,
> CPU,network) making issues and performance analysis more complicated.
> - there will be a real performance impact. It's depends on what you do, and
> how is configured Windows& vmware, but on my non optimized laptop I lose
> more than 50%. VMWare claims 15% max, but it's without Windows (using direct
Where you take a big hit is in disk IO, as what your OS thinks is a disk
with sequentially stored files is just a single file in the host OS that
may be scattered round the real HDD. Disk IO goes through too many
layers. It's often faster to NFS mount the real HDD.
For compute intensive work, the performance hit isn't so bad, at least
provided you don't swap.
> - Last time I checked (a few months ago), vmware was not able to use all the
> core& memory of medium sized servers.
Same with VirtualBox, which I like because it is lighter weight.
I use VMs because the infrastructure provides it; things like ElasticMR
from AWS also offer it. Your code may be slower, but what you get is the
ability to bring up clusters on a pay-per-hour basis, and the ability to
vary the #of machines based on the workload/execution plan. If you can
compensate for the IO hit by renting four more servers, you may still
come out ahead.