Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - difference between development and production platform???


Copy link to this message
-
Re: difference between development and production platform???
Steve Loughran 2011-09-28, 09:16
On 28/09/11 04:19, Hamedani, Masoud wrote:
> Special Thanks for your help Arko,
>
> You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all
> the clusters should deployed on Linux machines???
> We have lots of data (on windows OS) and code (written in C#) for data
> mining, we wana to use Hadoop and make connection between
> our existing systems and programs with it.
> as you mentioned we should move all of our data to Linux systems, and
> execute existing C# codes in Linux and only use windows for
> development same as before.
> Am I right?
>

What is really meant is "nobody runs hadoop at scale on Windows".

Specifically
  -there's an expectation that there is a unix API you can exec
  -some of the operations (e.g. how programs are exec()'d) are optimised
for linux
  -everyone tests on 50+ node clusters on Linux.

Why Linux? Stable, low cost. And you can install it on your
laptop/desktop and develop there too.
Because everyone uses Linux (or possibly a genuine Unix system like
Solaris), problems encountered in real systems get found on Linux and
fixed.

If you want to run a production Hadoop cluster on Windows, you are free
to do so. Just be aware that you may be the first person to do so at
scale, so you get to find problems first, you get to file the bugs -and
because you are the only person with these problems and the ability to
replicate them- you get to fix them.

Nobody is going to say "oh, this patch is for Windows only use, we will
reject it" -at least provided it doesn't have adverse effects on
Linux/Unix. It's just that nobody else publicly runs Hadoop on Windows.
A key step 1 will be cross compiling all the native code to Windows,
which on 0.23+ also means protocol buffers. Enjoy.

Where you will find problems is that even on Win64, Hadoop can't
directly load or run C# APPs or anything else written to compile against
their managed runtime (I forget it's name). You will have to bridge via
streaming, and take a performance hit.

You could also try running the C# code under Mono on Linux; it may or
may not work. Again, you get to find out and fix the problems -this time
with the Mono project.

-Steve