Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop on Windows

Copy link to this message
Re: Hadoop on Windows
http://issues.apache.org/jira/browse/HADOOP-4998 is opened for the  
purpose of substituting bash calls with library calls. It has been  
there for 8 months now and looks like it could use some help from  
hadoop contributors. :)


On Sep 17, 2009, at 7:29 PM, Harish Mallipeddi wrote:

> MySpace recently released their map-reduce implementation as  
> opensource
> (it's .NET based). MySpace as you might know is one of the few big  
> websites
> that runs on Windows.
> http://code.google.com/p/qizmt/
> On Thu, Sep 17, 2009 at 10:42 PM, Steve Loughran <[EMAIL PROTECTED]>  
> wrote:
>> Bill Habermaas wrote:
>>> It's interesting that Hadoop, being written entirely in Java, has  
>>> such a
>>> spotty reputation running on different platforms. I had to patch  
>>> it to run
>>> on AIX and need cygwin (gack!) so it will run on Windows. I'm  
>>> surprised
>>> nobody has thought about removing it's use of bash to run system  
>>> commands
>>> (which is NOT especially portable). Now that Hadoop only comes  
>>> only in a
>>> Java 1.6 flavor why can't it figure out disk space using the  
>>> native java
>>> runtime instead of executing the DF command under bash? Of course  
>>> it runs
>>> other system commands as well which in my opinion isn't too cool.
>> It is run at scale on big linux systems, and they are the ones that
>> encounter problems with 16GB heaps and exec(), various other JVM  
>> quirks that
>> lead the developers to say Linux + Sun JVM only. You are free to  
>> use other
>> operating systems and even JVMs (I've used JRockit with some minor  
>> logging
>> problems in test runs), but you get to encounter the problems. You  
>> can and
>> should submit patches back, but if you diverge from the approved  
>> standard,
>> you get to retest at scale, because nobody else is going to do it  
>> for you.
>> Supporting different unix versions is much easier than supporting
>> windows+linux/unix, especially if you are trying to do high  
>> availability
>> stuff, integrate with management tools, etc. I think it would be  
>> nice if
>> Hadoop would build and run standalone on Windows without cygwin,  
>> but for all
>> other actions, a more ruthless "Unix-ish only" would be harsh but  
>> make it
>> easier to manage problems.
>> Even in a Linux-only world, you are left with the "which distro",  
>> question
>> -were there to be official apache Hadoop RPMs and .deb files,  
>> there'd be
>> discussions about which platforms to support. RHEL+Centos 5.X would  
>> be the
>> obvious choice, but what else?
>> -steve
> --
> Harish Mallipeddi
> http://blog.poundbang.in