|
|
-
difference between development and production platform???
Hamedani, Masoud 2011-09-28, 02:32
Dear Friends,
Im new in hadoop for an important data mining university research, i saw these sentences in different hadoop related docs:
{ Win32 is supported as a *development platform* not as a *production platform*, but Linux supported both. }
whats difference between *development platform and * *production platform ??? *it means dataNode and nameNode??
Thanks, B.S
-
Re: difference between development and production platform???
Arko Provo Mukherjee 2011-09-28, 03:05
Hi,
A development platform is the system (s) which are used mainly for the developers to write / unit test code for the project.
There are generally NO end users in the Development system.
Production platform is where the end users actually work and the project is generally moved here only after it is tested in one / more test platforms.
Typically, if the developer is the end user, which it is in some cases, (even more likely for University projects) there's generally no need to make your project run on separate production or test system(s).
The documentation means that you can use Hadoop in WIn32 for developing your code, but finally if you use that code and then run production boxes on Win32 (i.e end users are using a Win32 Hadoop system), then that is not supported.
Correct me guys if I am wrong.
Thanks & regards Arko
On Tue, Sep 27, 2011 at 9:32 PM, Hamedani, Masoud <[EMAIL PROTECTED]> wrote: > Dear Friends, > > Im new in hadoop for an important data mining university research, i saw > these sentences in different hadoop related docs: > > { Win32 is supported as a *development platform* not as a *production > platform*, but Linux supported both. } > > whats difference between *development platform and * *production platform > ??? > *it means dataNode and nameNode?? > > Thanks, > B.S >
-
Re: difference between development and production platform???
Hamedani, Masoud 2011-09-28, 03:19
Special Thanks for your help Arko,
You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all the clusters should deployed on Linux machines??? We have lots of data (on windows OS) and code (written in C#) for data mining, we wana to use Hadoop and make connection between our existing systems and programs with it. as you mentioned we should move all of our data to Linux systems, and execute existing C# codes in Linux and only use windows for development same as before. Am I right?
Thanks, B.S Masoud.
2011/9/28 Arko Provo Mukherjee <[EMAIL PROTECTED]>
> Hi, > > A development platform is the system (s) which are used mainly for the > developers to write / unit test code for the project. > > There are generally NO end users in the Development system. > > Production platform is where the end users actually work and the > project is generally moved here only after it is tested in one / more > test platforms. > > Typically, if the developer is the end user, which it is in some > cases, (even more likely for University projects) there's generally no > need to make your project run on separate production or test > system(s). > > The documentation means that you can use Hadoop in WIn32 for > developing your code, but finally if you use that code and then run > production boxes on Win32 (i.e end users are using a Win32 Hadoop > system), then that is not supported. > > Correct me guys if I am wrong. > > Thanks & regards > Arko > > On Tue, Sep 27, 2011 at 9:32 PM, Hamedani, Masoud > <[EMAIL PROTECTED]> wrote: > > Dear Friends, > > > > Im new in hadoop for an important data mining university research, i saw > > these sentences in different hadoop related docs: > > > > { Win32 is supported as a *development platform* not as a *production > > platform*, but Linux supported both. } > > > > whats difference between *development platform and * *production platform > > ??? > > *it means dataNode and nameNode?? > > > > Thanks, > > B.S > > >
-
Re: difference between development and production platform???
Linden Hillenbrand 2011-09-28, 04:25
Currently Windows is not a supported production platform for Hadoop. You should run all of your daemons on Linux machines. You can move your data to HDFS on those nodes easily, the C# piece you can use Hadoop Streaming ( http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Hadoop+Streaming)to leverage the code you already have written., and if you have trouble it shouldn't be too bad to port over to Java. Therefore you shouldn't have to do much re-work. I hope this helps. Best, Linden On Tue, Sep 27, 2011 at 11:19 PM, Hamedani, Masoud < [EMAIL PROTECTED]> wrote: > Special Thanks for your help Arko, > > You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all > the clusters should deployed on Linux machines??? > We have lots of data (on windows OS) and code (written in C#) for data > mining, we wana to use Hadoop and make connection between > our existing systems and programs with it. > as you mentioned we should move all of our data to Linux systems, and > execute existing C# codes in Linux and only use windows for > development same as before. > Am I right? > > Thanks, > B.S > Masoud. > > 2011/9/28 Arko Provo Mukherjee <[EMAIL PROTECTED]> > > > Hi, > > > > A development platform is the system (s) which are used mainly for the > > developers to write / unit test code for the project. > > > > There are generally NO end users in the Development system. > > > > Production platform is where the end users actually work and the > > project is generally moved here only after it is tested in one / more > > test platforms. > > > > Typically, if the developer is the end user, which it is in some > > cases, (even more likely for University projects) there's generally no > > need to make your project run on separate production or test > > system(s). > > > > The documentation means that you can use Hadoop in WIn32 for > > developing your code, but finally if you use that code and then run > > production boxes on Win32 (i.e end users are using a Win32 Hadoop > > system), then that is not supported. > > > > Correct me guys if I am wrong. > > > > Thanks & regards > > Arko > > > > On Tue, Sep 27, 2011 at 9:32 PM, Hamedani, Masoud > > <[EMAIL PROTECTED]> wrote: > > > Dear Friends, > > > > > > Im new in hadoop for an important data mining university research, i > saw > > > these sentences in different hadoop related docs: > > > > > > { Win32 is supported as a *development platform* not as a *production > > > platform*, but Linux supported both. } > > > > > > whats difference between *development platform and * *production > platform > > > ??? > > > *it means dataNode and nameNode?? > > > > > > Thanks, > > > B.S > > > > > > -- Linden Hillenbrand Customer Operations Engineer Phone: 650.644.3900 x4946 Email: [EMAIL PROTECTED] Twitter: @lhillenbrand Data: http://www.cloudera.com
-
Re: difference between development and production platform???
Arko Provo Mukherjee 2011-09-28, 04:30
Hi,
You necessarily don't need to execute the C# codes on Linux.
You can write a middleware application to bring the data from the Win boxes to the Linux (Hadoop) boxes if you want to.
Cheers Arko
On Tue, Sep 27, 2011 at 10:19 PM, Hamedani, Masoud <[EMAIL PROTECTED]> wrote: > Special Thanks for your help Arko, > > You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all > the clusters should deployed on Linux machines??? > We have lots of data (on windows OS) and code (written in C#) for data > mining, we wana to use Hadoop and make connection between > our existing systems and programs with it. > as you mentioned we should move all of our data to Linux systems, and > execute existing C# codes in Linux and only use windows for > development same as before. > Am I right? > > Thanks, > B.S > Masoud. > > 2011/9/28 Arko Provo Mukherjee <[EMAIL PROTECTED]> > >> Hi, >> >> A development platform is the system (s) which are used mainly for the >> developers to write / unit test code for the project. >> >> There are generally NO end users in the Development system. >> >> Production platform is where the end users actually work and the >> project is generally moved here only after it is tested in one / more >> test platforms. >> >> Typically, if the developer is the end user, which it is in some >> cases, (even more likely for University projects) there's generally no >> need to make your project run on separate production or test >> system(s). >> >> The documentation means that you can use Hadoop in WIn32 for >> developing your code, but finally if you use that code and then run >> production boxes on Win32 (i.e end users are using a Win32 Hadoop >> system), then that is not supported. >> >> Correct me guys if I am wrong. >> >> Thanks & regards >> Arko >> >> On Tue, Sep 27, 2011 at 9:32 PM, Hamedani, Masoud >> <[EMAIL PROTECTED]> wrote: >> > Dear Friends, >> > >> > Im new in hadoop for an important data mining university research, i saw >> > these sentences in different hadoop related docs: >> > >> > { Win32 is supported as a *development platform* not as a *production >> > platform*, but Linux supported both. } >> > >> > whats difference between *development platform and * *production platform >> > ??? >> > *it means dataNode and nameNode?? >> > >> > Thanks, >> > B.S >> > >> >
-
Re: difference between development and production platform???
Linden Hillenbrand 2011-09-28, 04:47
Hadoop Streaming :) On Wed, Sep 28, 2011 at 12:30 AM, Arko Provo Mukherjee < [EMAIL PROTECTED]> wrote: > Hi, > > You necessarily don't need to execute the C# codes on Linux. > > You can write a middleware application to bring the data from the Win > boxes to the Linux (Hadoop) boxes if you want to. > > Cheers > Arko > > On Tue, Sep 27, 2011 at 10:19 PM, Hamedani, Masoud > <[EMAIL PROTECTED]> wrote: > > Special Thanks for your help Arko, > > > > You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all > > the clusters should deployed on Linux machines??? > > We have lots of data (on windows OS) and code (written in C#) for data > > mining, we wana to use Hadoop and make connection between > > our existing systems and programs with it. > > as you mentioned we should move all of our data to Linux systems, and > > execute existing C# codes in Linux and only use windows for > > development same as before. > > Am I right? > > > > Thanks, > > B.S > > Masoud. > > > > 2011/9/28 Arko Provo Mukherjee <[EMAIL PROTECTED]> > > > >> Hi, > >> > >> A development platform is the system (s) which are used mainly for the > >> developers to write / unit test code for the project. > >> > >> There are generally NO end users in the Development system. > >> > >> Production platform is where the end users actually work and the > >> project is generally moved here only after it is tested in one / more > >> test platforms. > >> > >> Typically, if the developer is the end user, which it is in some > >> cases, (even more likely for University projects) there's generally no > >> need to make your project run on separate production or test > >> system(s). > >> > >> The documentation means that you can use Hadoop in WIn32 for > >> developing your code, but finally if you use that code and then run > >> production boxes on Win32 (i.e end users are using a Win32 Hadoop > >> system), then that is not supported. > >> > >> Correct me guys if I am wrong. > >> > >> Thanks & regards > >> Arko > >> > >> On Tue, Sep 27, 2011 at 9:32 PM, Hamedani, Masoud > >> <[EMAIL PROTECTED]> wrote: > >> > Dear Friends, > >> > > >> > Im new in hadoop for an important data mining university research, i > saw > >> > these sentences in different hadoop related docs: > >> > > >> > { Win32 is supported as a *development platform* not as a *production > >> > platform*, but Linux supported both. } > >> > > >> > whats difference between *development platform and * *production > platform > >> > ??? > >> > *it means dataNode and nameNode?? > >> > > >> > Thanks, > >> > B.S > >> > > >> > > > -- Linden Hillenbrand Customer Operations Engineer Phone: 650.644.3900 x4946 Email: [EMAIL PROTECTED] Twitter: @lhillenbrand Data: http://www.cloudera.com
-
Re: difference between development and production platform???
Hamedani, Masoud 2011-09-28, 04:50
Thanks for your nice help Arko, maybe because im new in hadoop i cant get some of points, im studying hadoop manual more deeply to have better info.
B.S Masoud.
2011/9/28 Arko Provo Mukherjee <[EMAIL PROTECTED]>
> Hi, > > You necessarily don't need to execute the C# codes on Linux. > > You can write a middleware application to bring the data from the Win > boxes to the Linux (Hadoop) boxes if you want to. > > Cheers > Arko > > On Tue, Sep 27, 2011 at 10:19 PM, Hamedani, Masoud > <[EMAIL PROTECTED]> wrote: > > Special Thanks for your help Arko, > > > > You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all > > the clusters should deployed on Linux machines??? > > We have lots of data (on windows OS) and code (written in C#) for data > > mining, we wana to use Hadoop and make connection between > > our existing systems and programs with it. > > as you mentioned we should move all of our data to Linux systems, and > > execute existing C# codes in Linux and only use windows for > > development same as before. > > Am I right? > > > > Thanks, > > B.S > > Masoud. > > > > 2011/9/28 Arko Provo Mukherjee <[EMAIL PROTECTED]> > > > >> Hi, > >> > >> A development platform is the system (s) which are used mainly for the > >> developers to write / unit test code for the project. > >> > >> There are generally NO end users in the Development system. > >> > >> Production platform is where the end users actually work and the > >> project is generally moved here only after it is tested in one / more > >> test platforms. > >> > >> Typically, if the developer is the end user, which it is in some > >> cases, (even more likely for University projects) there's generally no > >> need to make your project run on separate production or test > >> system(s). > >> > >> The documentation means that you can use Hadoop in WIn32 for > >> developing your code, but finally if you use that code and then run > >> production boxes on Win32 (i.e end users are using a Win32 Hadoop > >> system), then that is not supported. > >> > >> Correct me guys if I am wrong. > >> > >> Thanks & regards > >> Arko > >> > >> On Tue, Sep 27, 2011 at 9:32 PM, Hamedani, Masoud > >> <[EMAIL PROTECTED]> wrote: > >> > Dear Friends, > >> > > >> > Im new in hadoop for an important data mining university research, i > saw > >> > these sentences in different hadoop related docs: > >> > > >> > { Win32 is supported as a *development platform* not as a *production > >> > platform*, but Linux supported both. } > >> > > >> > whats difference between *development platform and * *production > platform > >> > ??? > >> > *it means dataNode and nameNode?? > >> > > >> > Thanks, > >> > B.S > >> > > >> > > >
-
Re: difference between development and production platform???
Steve Loughran 2011-09-28, 09:16
On 28/09/11 04:19, Hamedani, Masoud wrote: > Special Thanks for your help Arko, > > You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all > the clusters should deployed on Linux machines??? > We have lots of data (on windows OS) and code (written in C#) for data > mining, we wana to use Hadoop and make connection between > our existing systems and programs with it. > as you mentioned we should move all of our data to Linux systems, and > execute existing C# codes in Linux and only use windows for > development same as before. > Am I right? >
What is really meant is "nobody runs hadoop at scale on Windows".
Specifically -there's an expectation that there is a unix API you can exec -some of the operations (e.g. how programs are exec()'d) are optimised for linux -everyone tests on 50+ node clusters on Linux.
Why Linux? Stable, low cost. And you can install it on your laptop/desktop and develop there too. Because everyone uses Linux (or possibly a genuine Unix system like Solaris), problems encountered in real systems get found on Linux and fixed.
If you want to run a production Hadoop cluster on Windows, you are free to do so. Just be aware that you may be the first person to do so at scale, so you get to find problems first, you get to file the bugs -and because you are the only person with these problems and the ability to replicate them- you get to fix them.
Nobody is going to say "oh, this patch is for Windows only use, we will reject it" -at least provided it doesn't have adverse effects on Linux/Unix. It's just that nobody else publicly runs Hadoop on Windows. A key step 1 will be cross compiling all the native code to Windows, which on 0.23+ also means protocol buffers. Enjoy.
Where you will find problems is that even on Win64, Hadoop can't directly load or run C# APPs or anything else written to compile against their managed runtime (I forget it's name). You will have to bridge via streaming, and take a performance hit.
You could also try running the C# code under Mono on Linux; it may or may not work. Again, you get to find out and fix the problems -this time with the Mono project.
-Steve
-
Re: difference between development and production platform???
Hamedani, Masoud 2011-09-29, 04:02
Dear Steve,
thanks for your useful comments, I completely agree with your idea, personally its more than 10 years that im only using Fedora, java, java related techs, and open source software in all of my projects, but this is a critical situation, all of current data and apps in our univ's lab deployed on Microsoft platform. we can transfer our data from windows to Linux, but all of the codes are written in C#, we can connect C# code to hadoop and run them on Linux too but personally i cant grantee the result. *SO AS A SUMMARY*: 1- we can only use Linux machines for production platform, 2- and only using windows as *development platform* in pseudo-distributed mode.
AM I RIGHT in 1 and 2? please correct or verify them.
Thanks, BS. Masoud,
2011/9/28 Steve Loughran <[EMAIL PROTECTED]>
> On 28/09/11 04:19, Hamedani, Masoud wrote: > >> Special Thanks for your help Arko, >> >> You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all >> the clusters should deployed on Linux machines??? >> We have lots of data (on windows OS) and code (written in C#) for data >> mining, we wana to use Hadoop and make connection between >> our existing systems and programs with it. >> as you mentioned we should move all of our data to Linux systems, and >> execute existing C# codes in Linux and only use windows for >> development same as before. >> Am I right? >> >> > What is really meant is "nobody runs hadoop at scale on Windows". > > Specifically > -there's an expectation that there is a unix API you can exec > -some of the operations (e.g. how programs are exec()'d) are optimised for > linux > -everyone tests on 50+ node clusters on Linux. > > Why Linux? Stable, low cost. And you can install it on your laptop/desktop > and develop there too. > > > Because everyone uses Linux (or possibly a genuine Unix system like > Solaris), problems encountered in real systems get found on Linux and fixed. > > If you want to run a production Hadoop cluster on Windows, you are free to > do so. Just be aware that you may be the first person to do so at scale, so > you get to find problems first, you get to file the bugs -and because you > are the only person with these problems and the ability to replicate them- > you get to fix them. > > Nobody is going to say "oh, this patch is for Windows only use, we will > reject it" -at least provided it doesn't have adverse effects on Linux/Unix. > It's just that nobody else publicly runs Hadoop on Windows. A key step 1 > will be cross compiling all the native code to Windows, which on 0.23+ also > means protocol buffers. Enjoy. > > Where you will find problems is that even on Win64, Hadoop can't directly > load or run C# APPs or anything else written to compile against their > managed runtime (I forget it's name). You will have to bridge via streaming, > and take a performance hit. > > You could also try running the C# code under Mono on Linux; it may or may > not work. Again, you get to find out and fix the problems -this time with > the Mono project. > > -Steve >
|
|