|
|
-
Hadoop on a Virtualized O/S vs. the Real O/S
Stephen Watt 2010-02-08, 19:58
Hi Folks
I need to be able to certify that Hadoop works on various operating systems. I do this by running a series it through a series of tests. As I'm sure you can empathize, obtaining all the machines for each test run can sometimes be tricky. It would be easier for me if I can spin up several instances a virtual image of the desired O/S, but to do this, I need to know if there are any risks I'm running using that approach.
Is there any reason why Hadoop might work differently on a virtual O/S as opposed to running on an actual O/S ? Since just about everything is done through the JVM and SSH I don't foresee any issues and I don't believe we're doing anything weird with device drivers or have any kernel module dependencies.
Kind regards Steve Watt
-
RE: Hadoop on a Virtualized O/S vs. the Real O/S
Bill Habermaas 2010-02-08, 20:24
In my shop we also did certification on different operating platforms. This was done on virtualized machines for all the Linux variants. We ran the Apache hadoop unit tests in each environment and then checked the results. Overall hadoop runs well but some of the more bizarre lunatic unit tests will react strangely.
You will likely see the same issues as we did...
1. Some Networking APIs behave slight differently between Linux and Solaris/Aix environments. 2. Windows will encounter many failed tests under cygwin and not in a consistent manner. Sometimes a test will work and other times it won't. I suspect because cvgwin is not a perfect simulation and race conditions cause different reactions - depending on the phase of the moon. Oh well, Windows is not for production anyway <shrug>
Bill
-----Original Message----- From: Stephen Watt [mailto:[EMAIL PROTECTED]] Sent: Monday, February 08, 2010 2:58 PM To: [EMAIL PROTECTED] Subject: Hadoop on a Virtualized O/S vs. the Real O/S
Hi Folks
I need to be able to certify that Hadoop works on various operating systems. I do this by running a series it through a series of tests. As I'm sure you can empathize, obtaining all the machines for each test run can sometimes be tricky. It would be easier for me if I can spin up several instances a virtual image of the desired O/S, but to do this, I need to know if there are any risks I'm running using that approach.
Is there any reason why Hadoop might work differently on a virtual O/S as opposed to running on an actual O/S ? Since just about everything is done through the JVM and SSH I don't foresee any issues and I don't believe we're doing anything weird with device drivers or have any kernel module dependencies.
Kind regards Steve Watt
-
Re: Hadoop on a Virtualized O/S vs. the Real O/S
Steve Loughran 2010-02-09, 12:13
Stephen Watt wrote: > Hi Folks > > I need to be able to certify that Hadoop works on various operating > systems. I do this by running a series it through a series of tests. As > I'm sure you can empathize, obtaining all the machines for each test run > can sometimes be tricky. It would be easier for me if I can spin up > several instances a virtual image of the desired O/S, but to do this, I > need to know if there are any risks I'm running using that approach. > > Is there any reason why Hadoop might work differently on a virtual O/S as > opposed to running on an actual O/S ? Since just about everything is done > through the JVM and SSH I don't foresee any issues and I don't believe > we're doing anything weird with device drivers or have any kernel module > dependencies. > > Kind regards > Steve Watt
I run Hadoop on VMs
- performance can be below raw IO rates, but that's predictable - if you bring up a private network then you have DNS/rDNS problems. Hadoop is happy if everything knows who it is and DNS does too. Otherwise: edit the hosts tables - the big enemy on VMs is unexpected swapping out and clock drift, screws up anything that assumes time moves forward at roughly the same rate everywhere. Zookeeper assumes this, as do most distributed co-ordination systems. If you keep VM load low, one Virtual CPU per physical one, and don't overallocate physical memory, most of these problems go away -set the CPU affinity for the VM so it is always bonded to the same CPU, using taskset or the equivalent. Minimises cache misses and other problems
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext