|
|
-
Re: hadoop question using VMWARESteve Loughran 2011-09-28, 09:22
On 28/09/11 08:37, N Keywal wrote:
> For example: > - It's adding two layers (windows& linux), that can both fail, especially > under heavy workload (and hadoop is built to use all the resources > available). They will need to be managed as well (software upgrades, > hardware support...), it's an extra cost. > - These two layers will use randomly the different resources (HDD, > CPU,network) making issues and performance analysis more complicated. > - there will be a real performance impact. It's depends on what you do, and > how is configured Windows& vmware, but on my non optimized laptop I lose > more than 50%. VMWare claims 15% max, but it's without Windows (using direct > ESX) Where you take a big hit is in disk IO, as what your OS thinks is a disk with sequentially stored files is just a single file in the host OS that may be scattered round the real HDD. Disk IO goes through too many layers. It's often faster to NFS mount the real HDD. For compute intensive work, the performance hit isn't so bad, at least provided you don't swap. > - Last time I checked (a few months ago), vmware was not able to use all the > core& memory of medium sized servers. Same with VirtualBox, which I like because it is lighter weight. I use VMs because the infrastructure provides it; things like ElasticMR from AWS also offer it. Your code may be slower, but what you get is the ability to bring up clusters on a pay-per-hour basis, and the ability to vary the #of machines based on the workload/execution plan. If you can compensate for the IO hit by renting four more servers, you may still come out ahead. http://www.slideshare.net/steve_l/farming-hadoop-inthecloud |