|
elton sky
2010-10-10, 04:40
Dennis
2010-10-10, 04:43
maha
2010-10-10, 04:46
Arvind Kalyan
2010-10-10, 06:07
Shi Yu
2010-10-10, 06:39
BM
2010-10-10, 11:33
Ken Goodhope
2010-10-10, 19:00
Wildan Maulana )) OpenThi...
2010-10-10, 19:46
Owen O'Malley
2010-10-10, 20:17
Matt Tanquary
2010-10-10, 21:12
Shi Yu
2010-10-10, 21:44
Konstantin Boudnik
2010-10-11, 15:56
Steve Loughran
2010-10-11, 16:34
helwr
2010-10-11, 23:50
Dhruba Borthakur
2010-10-12, 03:18
Chris Dyer
2010-10-12, 04:20
Steve Loughran
2010-10-12, 09:45
Ricky Ho
2010-10-12, 16:57
Edward Capriolo
2010-10-13, 01:19
michael j pan
2010-10-13, 03:13
Steve Loughran
2010-10-13, 11:42
Scott Carey
2010-10-23, 02:07
Scott Carey
2010-10-23, 02:09
Steve Loughran
2010-10-25, 10:03
baloodevil
2011-03-16, 17:43
Ted Dunning
2011-03-16, 20:57
|
-
Why hadoop is written in java?elton sky 2010-10-10, 04:40
I always have this question but couldn't find proper answer for this. For
system level applications, c/c++ is preferable. But why this one using java?
-
Re: Why hadoop is written in java?Dennis 2010-10-10, 04:43
It's easier to use java. Using c/c++, you going to need write 10 times code than java.I think.
Dennis --- On Sun, 10/10/10, elton sky <[EMAIL PROTECTED]> wrote: From: elton sky <[EMAIL PROTECTED]> Subject: Why hadoop is written in java? To: "common-user" <[EMAIL PROTECTED]> Date: Sunday, October 10, 2010, 12:40 PM I always have this question but couldn't find proper answer for this. For system level applications, c/c++ is preferable. But why this one using java?
-
Re: Why hadoop is written in java?maha 2010-10-10, 04:46
I totally agree with Dennis, besides, Java is more secure compared to C++ (eg. pointers operation with memory management).
Maha On Oct 9, 2010, at 9:43 PM, Dennis wrote: > It's easier to use java. Using c/c++, you going to need write 10 times code than java.I think. > Dennis > > --- On Sun, 10/10/10, elton sky <[EMAIL PROTECTED]> wrote: > > From: elton sky <[EMAIL PROTECTED]> > Subject: Why hadoop is written in java? > To: "common-user" <[EMAIL PROTECTED]> > Date: Sunday, October 10, 2010, 12:40 PM > > I always have this question but couldn't find proper answer for this. For > system level applications, c/c++ is preferable. But why this one using java? > > >
-
Re: Why hadoop is written in java?Arvind Kalyan 2010-10-10, 06:07
On Sat, Oct 9, 2010 at 9:40 PM, elton sky <[EMAIL PROTECTED]> wrote:
> I always have this question but couldn't find proper answer for this. For > system level applications, c/c++ is preferable. But why this one using > java? > Look at the system (software) requirements for running Hadoop: http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs Imagine how it would be, if it were to be written in C/C++. While C/C++ might give you a performance improvement at run-time, it can be a total nightmare to develop and maintain. Especially if the network gets to be heterogeneous. -- Arvind Kalyan http://www.linkedin.com/in/base16 h: (408) 331-7921 m: (541) 971-9225
-
Re: Why hadoop is written in java?Shi Yu 2010-10-10, 06:39
Wondering how Hadoop running with python and other languages. Java is
easy to develop, however, not very efficient to handle numerical computation with objects like sparse matrices. Maybe hadoop will have Matlab, R extensions as well? Hope to see it happens. On 2010-10-10 1:07, Arvind Kalyan wrote: > On Sat, Oct 9, 2010 at 9:40 PM, elton sky<[EMAIL PROTECTED]> wrote: > > >> I always have this question but couldn't find proper answer for this. For >> system level applications, c/c++ is preferable. But why this one using >> java? >> >> > > Look at the system (software) requirements for running Hadoop: > http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs > > Imagine how it would be, if it were to be written in C/C++. > > While C/C++ might give you a performance improvement at run-time, it can be > a total nightmare to develop and maintain. Especially if the network gets to > be heterogeneous. > > > >
-
Re: Why hadoop is written in java?BM 2010-10-10, 11:33
On Sun, Oct 10, 2010 at 1:40 PM, elton sky <[EMAIL PROTECTED]> wrote:
> I always have this question but couldn't find proper answer for this. For > system level applications, c/c++ is preferable. But why this one using java? Long story short: Because C/C++ sucks bit time at clustering and development speed, especially when it comes to maintain heterogeneity and security. At the same time, benefit is not very big (rather too small to pay attention to it), since performance of it is still very questionable. Now C++ is not that much faster these days from Java to let someone sacrifice entire life, locking [him/her]self in a cell of monastery for that whole Hadoop mission... :-) -- Kind regards, BM Things, that are stupid at the beginning, rarely ends up wisely.
-
Re: Why hadoop is written in java?Ken Goodhope 2010-10-10, 19:00
You might want to take a look at Dumbo for use in writing hadoop jobs
with python. On Saturday, October 9, 2010, Shi Yu <[EMAIL PROTECTED]> wrote: > Wondering how Hadoop running with python and other languages. Java is easy to develop, however, not very efficient to handle numerical computation with objects like sparse matrices. Maybe hadoop will have Matlab, R extensions as well? Hope to see it happens. > > > > On 2010-10-10 1:07, Arvind Kalyan wrote: > > On Sat, Oct 9, 2010 at 9:40 PM, elton sky<[EMAIL PROTECTED]> wrote: > > > > I always have this question but couldn't find proper answer for this. For > system level applications, c/c++ is preferable. But why this one using > java? > > > > > Look at the system (software) requirements for running Hadoop: > http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs > > Imagine how it would be, if it were to be written in C/C++. > > While C/C++ might give you a performance improvement at run-time, it can be > a total nightmare to develop and maintain. Especially if the network gets to > be heterogeneous. > > > > > >
-
Re: Why hadoop is written in java?Wildan Maulana )) OpenThi... 2010-10-10, 19:46
AFAIK, Google using C/C++ to build hadoop like that power the google search
now ... CMIIW Regards, Wildan --- OpenThink Labs Indonesia | http://www.openthinklabs.com Harmonizing IT, Business and Education Negeri Pelangi | http://www.negeripelangi.com a Pay it Forward Company Wildan Maulana Blog | http://wildan.openthinklabs.com Ecopreneur's Guide | http://wildan.openthinklabs.com/ecopreneurs-guide-handbook/ >> +62-87884599249 Y! : hawking_123 Linkedln : http://www.linkedin.com/in/wildanmaulana Twitter : http://twitter.com/wildanmaulana On Mon, Oct 11, 2010 at 2:00 AM, Ken Goodhope <[EMAIL PROTECTED]> wrote: > You might want to take a look at Dumbo for use in writing hadoop jobs > with python. > > On Saturday, October 9, 2010, Shi Yu <[EMAIL PROTECTED]> wrote: > > Wondering how Hadoop running with python and other languages. Java is > easy to develop, however, not very efficient to handle numerical computation > with objects like sparse matrices. Maybe hadoop will have Matlab, R > extensions as well? Hope to see it happens. > > > > > > > > On 2010-10-10 1:07, Arvind Kalyan wrote: > > > > On Sat, Oct 9, 2010 at 9:40 PM, elton sky<[EMAIL PROTECTED]> > wrote: > > > > > > > > I always have this question but couldn't find proper answer for this. For > > system level applications, c/c++ is preferable. But why this one using > > java? > > > > > > > > > > Look at the system (software) requirements for running Hadoop: > > > http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs > > > > Imagine how it would be, if it were to be written in C/C++. > > > > While C/C++ might give you a performance improvement at run-time, it can > be > > a total nightmare to develop and maintain. Especially if the network gets > to > > be heterogeneous. > > > > > > > > > > > > >
-
Re: Why hadoop is written in java?Owen O'Malley 2010-10-10, 20:17
The real answer is that Hadoop was written originally to support Nutch, which is in Java. Java has mostly served us well being reliable, extremely powerful libraries, and being far easier to debug than C++. There are issues of course... Java's interface to the OS is very weak, object memory overhead is high, and program startup is very slow.
-- Owen On Oct 9, 2010, at 21:40, elton sky <[EMAIL PROTECTED]> wrote: > I always have this question but couldn't find proper answer for this. For > system level applications, c/c++ is preferable. But why this one using java?
-
Re: Why hadoop is written in java?Matt Tanquary 2010-10-10, 21:12
Please check out Rhipe and Mahout projects, there are others as well,
but these are coming on strong and Hadoop has many avenues for extension through things such as python or matlab that you can take advantage of. The good thing is, if you have an algorithm or computational challenge that hasn't been met, you can solve it and share it with the rest of us. On Sat, Oct 9, 2010 at 11:39 PM, Shi Yu <[EMAIL PROTECTED]> wrote: > Wondering how Hadoop running with python and other languages. Java is easy > to develop, however, not very efficient to handle numerical computation with > objects like sparse matrices. Maybe hadoop will have Matlab, R extensions as > well? Hope to see it happens. > > > > On 2010-10-10 1:07, Arvind Kalyan wrote: >> >> On Sat, Oct 9, 2010 at 9:40 PM, elton sky<[EMAIL PROTECTED]> wrote: >> >> >>> >>> I always have this question but couldn't find proper answer for this. For >>> system level applications, c/c++ is preferable. But why this one using >>> java? >>> >>> >> >> Look at the system (software) requirements for running Hadoop: >> >> http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs >> >> Imagine how it would be, if it were to be written in C/C++. >> >> While C/C++ might give you a performance improvement at run-time, it can >> be >> a total nightmare to develop and maintain. Especially if the network gets >> to >> be heterogeneous. >> >> >> >> > -- Have you thanked a teacher today? ---> http://www.liftateacher.org
-
Re: Why hadoop is written in java?Shi Yu 2010-10-10, 21:44
That sounds interesting. I am interested in the perspective of using
Hadoop to solve huge scale convex / noconvex optimization problems. Will take a look at them. Thanks. Shi On 2010-10-10 16:12, Matt Tanquary wrote: > Please check out Rhipe and Mahout projects, there are others as well, > but these are coming on strong and Hadoop has many avenues for > extension through things such as python or matlab that you can take > advantage of. The good thing is, if you have an algorithm or > computational challenge that hasn't been met, you can solve it and > share it with the rest of us. > > On Sat, Oct 9, 2010 at 11:39 PM, Shi Yu<[EMAIL PROTECTED]> wrote: > >> Wondering how Hadoop running with python and other languages. Java is easy >> to develop, however, not very efficient to handle numerical computation with >> objects like sparse matrices. Maybe hadoop will have Matlab, R extensions as >> well? Hope to see it happens. >> >> >> >> On 2010-10-10 1:07, Arvind Kalyan wrote: >> >>> On Sat, Oct 9, 2010 at 9:40 PM, elton sky<[EMAIL PROTECTED]> wrote: >>> >>> >>> >>>> I always have this question but couldn't find proper answer for this. For >>>> system level applications, c/c++ is preferable. But why this one using >>>> java? >>>> >>>> >>>> >>> Look at the system (software) requirements for running Hadoop: >>> >>> http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs >>> >>> Imagine how it would be, if it were to be written in C/C++. >>> >>> While C/C++ might give you a performance improvement at run-time, it can >>> be >>> a total nightmare to develop and maintain. Especially if the network gets >>> to >>> be heterogeneous. >>> >>> >>> >>> >>> >> > > > -- Postdoctoral Scholar Institute for Genomics and Systems Biology Department of Medicine, the University of Chicago Knapp Center for Biomedical Discovery 900 E. 57th St. Room 10148 Chicago, IL 60637, US Tel: 773-702-6799
-
Re: Why hadoop is written in java?Konstantin Boudnik 2010-10-11, 15:56
To second your point ;-) Reminds me of times when Sun Micro bought GridEngine
(C-app). Me and a couple other folks were developing Distributed Task execution Framework (written in Java on top of JINI). Every time new version of eh... Windows was coming around the corner Grid people were screaming. Guess how easy it was for us ;) Cos On Sat, Oct 09, 2010 at 11:07PM, Arvind Kalyan wrote: > On Sat, Oct 9, 2010 at 9:40 PM, elton sky <[EMAIL PROTECTED]> wrote: > > > I always have this question but couldn't find proper answer for this. For > > system level applications, c/c++ is preferable. But why this one using > > java? > > > > > Look at the system (software) requirements for running Hadoop: > http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs > > Imagine how it would be, if it were to be written in C/C++. > > While C/C++ might give you a performance improvement at run-time, it can be > a total nightmare to develop and maintain. Especially if the network gets to > be heterogeneous. > > > > -- > Arvind Kalyan > http://www.linkedin.com/in/base16 > h: (408) 331-7921 m: (541) 971-9225
-
Re: Why hadoop is written in java?Steve Loughran 2010-10-11, 16:34
On 11/10/10 16:56, Konstantin Boudnik wrote:
> To second your point ;-) Reminds me of times when Sun Micro bought GridEngine > (C-app). Me and a couple other folks were developing Distributed Task execution > Framework (written in Java on top of JINI). > > Every time new version of eh... Windows was coming around the corner Grid > people were screaming. Guess how easy it was for us ;) > That said, the only large scale platform people are deploying Hadoop on is Linux, because it's the only one that other people running Hadoop are using. This leads to a bias in bug reports, optimisations and other deployment support. Even though Hadoop does run on other unixes, Windows and OS/X, whoever deploys it at scale gets to find the issues. And if there is some problem where the fix helps you but hurts linux installations, you aren't going to get your patch in. Same for non-Sun JVMs, which is one reason why I stopped using JRockit -the other being Oracle stopped giving the security patches away to developers who weren't paying the fees. Effectively Hadoop is a Linux only application, even there, being in Java has some advantages -no need to recompile the non-native bits for different OS releases. -memory management makes it way, way easier to write applications that don't leak memory. -good cross-platform build, testing and logging tools make it much easier for open source developers to play with. -because you can run test builds on windows, developers whose desktops are Windows can still code and debug locally. This makes it easier to play with hadoop. A C/C++ app would have to commit to an OS -inevitably, Linux- and use their build/test processes. You'd get good OS integration, at the cost of having to do more OS integration testing, and scare off code contributions from anyone who wasn't a C/C++ on Linux developer. And you've have to pick a Linux distribution to work "in". Incidentally, Cos, I hear that Dan Templeton and Tom White were demoing Hadoop on Grid Engine last month. Not seen the slides though. -Steve
-
Re: Why hadoop is written in java?helwr 2010-10-11, 23:50
Check out this thread: https://www.quora.com/Why-was-Hadoop-written-in-Java -- View this message in context: http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
-
Re: Why hadoop is written in java?Dhruba Borthakur 2010-10-12, 03:18
I agree with others in this list that Java provides faster software
development, the IO cost in Java is practically the same as in C/C++, etc. In short, most pieces of distributed software can be written in Java without any performance hiccups, as long as it is only system metadata that is handled by Java. One problem is when data-flow has to occur in Java. Each record that is read from the storage has to be de-serialized, uncompressed and then processed. This processing could be very slow in Java compared to when written in other languages, especially because of the creation/destruction of too many objects. It would have been nice if the map/reduce task could have been written in C/C++, or better still, if the sorting inside the MR framework could occur in C/C++. thanks, dhruba On Mon, Oct 11, 2010 at 4:50 PM, helwr <[EMAIL PROTECTED]> wrote: > > Check out this thread: > https://www.quora.com/Why-was-Hadoop-written-in-Java > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html > Sent from the Hadoop lucene-users mailing list archive at Nabble.com. > -- Connect to me at http://www.facebook.com/dhruba
-
Re: Why hadoop is written in java?Chris Dyer 2010-10-12, 04:20
The Java memory overhead is a quite serious problem, and a legitimate
and serious criticism of Hadoop. For MapReduce applications, it is often (although not always) possible to improve performance by doing more work in memory (e.g., using combiners and the like) before emitting data. Thus, the more memory available to your application, the more efficient it runs. Therefore, if you have a framework that locks up 500mb rather than 50mb, you systematically get less performance out of your cluster. The second issue is that C/C++ bindings are common and widely used from many languages, but it is not generally possible to interface directly with Java (or Java libraries) from another language, unless that language is also built on top of the JVM. This is a very unfortunate because many problems that would be quite naturally expressed in MapReduce are better solved in non-JVM languages. But, Java is what we have, and it works well enough for many things. On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote: > I agree with others in this list that Java provides faster software > development, the IO cost in Java is practically the same as in C/C++, etc. > In short, most pieces of distributed software can be written in Java without > any performance hiccups, as long as it is only system metadata that is > handled by Java. > > One problem is when data-flow has to occur in Java. Each record that is read > from the storage has to be de-serialized, uncompressed and then processed. > This processing could be very slow in Java compared to when written in other > languages, especially because of the creation/destruction of too many > objects. It would have been nice if the map/reduce task could have been > written in C/C++, or better still, if the sorting inside the MR framework > could occur in C/C++. > > thanks, > dhruba > > On Mon, Oct 11, 2010 at 4:50 PM, helwr <[EMAIL PROTECTED]> wrote: > >> >> Check out this thread: >> https://www.quora.com/Why-was-Hadoop-written-in-Java >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html >> Sent from the Hadoop lucene-users mailing list archive at Nabble.com. >> > > > > -- > Connect to me at http://www.facebook.com/dhruba >
-
Re: Why hadoop is written in java?Steve Loughran 2010-10-12, 09:45
On 12/10/10 05:20, Chris Dyer wrote:
> The Java memory overhead is a quite serious problem, and a legitimate > and serious criticism of Hadoop. For MapReduce applications, it is > often (although not always) possible to improve performance by doing > more work in memory (e.g., using combiners and the like) before > emitting data. Thus, the more memory available to your application, > the more efficient it runs. Therefore, if you have a framework that > locks up 500mb rather than 50mb, you systematically get less > performance out of your cluster. > > The second issue is that C/C++ bindings are common and widely used > from many languages, but it is not generally possible to interface > directly with Java (or Java libraries) from another language, unless > that language is also built on top of the JVM. This is a very > unfortunate because many problems that would be quite naturally > expressed in MapReduce are better solved in non-JVM languages. A few years back I went from a java project to 6 months doing something in C/C++. First it was like rediscovering stuff: mixins! ability to overwrite operators! STL! Then you start looking at the build and test process, and think "this hasn't moved on for a while", then struggling with CppUnit to do test-first development of COM service, setting up Cruise Control to run a build.xml that just <execs> visual studio's build to build your app, then you run the tests. Eventually, the tests worked. But then there was the memory leaks, the reference counter problems, the threading and race conditions issues, the inconsistency between windows and linux. And the string types. Oh, so many string types. char*, TCHAR*, LPCSTR, BSTR, etc. In Java, you have to go out of your way for a memory leak, so if your tests work, your code is functional and good to ship. But in C/C++, the engineering to go from code that passes its functional tests and code that doesn't leak memory, is thread safe and secure is way harder. Try representing a large graph in C++ that is shared across threads and not have memory problems to see what I mean. I agree, some Java independence would be nice, but I'd go higher, towards more graph and list centric languages, not closer to the metal. Scala support, anyone?
-
RE: Why hadoop is written in java?Ricky Ho 2010-10-12, 16:57
Is it easier if we change the question to : "Why does Java people create Hadoop
before C++ people ?" I agree that for framework like Hadoop, execution efficiency is at a higher priority than developer productivity. And if the user can use any language to write map and reduce function (like Hadoop streaming), then we should use the most efficient language to write the core framework. But again, don't forget the dynamics. It is not about which language is the most efficient. It is about within the group of parallel computing experts who is willing to spend time in Open source, what language are they more familiar with (or passionate about). Rgds, Ricky -----Original Message----- From: Chris Dyer [mailto:[EMAIL PROTECTED]] Sent: Monday, October 11, 2010 9:20 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Why hadoop is written in java? The Java memory overhead is a quite serious problem, and a legitimate and serious criticism of Hadoop. For MapReduce applications, it is often (although not always) possible to improve performance by doing more work in memory (e.g., using combiners and the like) before emitting data. Thus, the more memory available to your application, the more efficient it runs. Therefore, if you have a framework that locks up 500mb rather than 50mb, you systematically get less performance out of your cluster. The second issue is that C/C++ bindings are common and widely used from many languages, but it is not generally possible to interface directly with Java (or Java libraries) from another language, unless that language is also built on top of the JVM. This is a very unfortunate because many problems that would be quite naturally expressed in MapReduce are better solved in non-JVM languages. But, Java is what we have, and it works well enough for many things. On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote: > I agree with others in this list that Java provides faster software > development, the IO cost in Java is practically the same as in C/C++, etc. > In short, most pieces of distributed software can be written in Java without > any performance hiccups, as long as it is only system metadata that is > handled by Java. > > One problem is when data-flow has to occur in Java. Each record that is read > from the storage has to be de-serialized, uncompressed and then processed. > This processing could be very slow in Java compared to when written in other > languages, especially because of the creation/destruction of too many > objects. It would have been nice if the map/reduce task could have been > written in C/C++, or better still, if the sorting inside the MR framework > could occur in C/C++. > > thanks, > dhruba > > On Mon, Oct 11, 2010 at 4:50 PM, helwr <[EMAIL PROTECTED]> wrote: > >> >> Check out this thread: >> https://www.quora.com/Why-was-Hadoop-written-in-Java >> -- >> View this message in context: >>http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html >>l >> Sent from the Hadoop lucene-users mailing list archive at Nabble.com. >> > > > > -- > Connect to me at http://www.facebook.com/dhruba
-
Re: Why hadoop is written in java?Edward Capriolo 2010-10-13, 01:19
On Tue, Oct 12, 2010 at 12:20 AM, Chris Dyer <[EMAIL PROTECTED]> wrote:
> The Java memory overhead is a quite serious problem, and a legitimate > and serious criticism of Hadoop. For MapReduce applications, it is > often (although not always) possible to improve performance by doing > more work in memory (e.g., using combiners and the like) before > emitting data. Thus, the more memory available to your application, > the more efficient it runs. Therefore, if you have a framework that > locks up 500mb rather than 50mb, you systematically get less > performance out of your cluster. > > The second issue is that C/C++ bindings are common and widely used > from many languages, but it is not generally possible to interface > directly with Java (or Java libraries) from another language, unless > that language is also built on top of the JVM. This is a very > unfortunate because many problems that would be quite naturally > expressed in MapReduce are better solved in non-JVM languages. > > But, Java is what we have, and it works well enough for many things. > > On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote: >> I agree with others in this list that Java provides faster software >> development, the IO cost in Java is practically the same as in C/C++, etc. >> In short, most pieces of distributed software can be written in Java without >> any performance hiccups, as long as it is only system metadata that is >> handled by Java. >> >> One problem is when data-flow has to occur in Java. Each record that is read >> from the storage has to be de-serialized, uncompressed and then processed. >> This processing could be very slow in Java compared to when written in other >> languages, especially because of the creation/destruction of too many >> objects. It would have been nice if the map/reduce task could have been >> written in C/C++, or better still, if the sorting inside the MR framework >> could occur in C/C++. >> >> thanks, >> dhruba >> >> On Mon, Oct 11, 2010 at 4:50 PM, helwr <[EMAIL PROTECTED]> wrote: >> >>> >>> Check out this thread: >>> https://www.quora.com/Why-was-Hadoop-written-in-Java >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html >>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com. >>> >> >> >> >> -- >> Connect to me at http://www.facebook.com/dhruba >> > Hate to say it this way... but yet another "java is slow compared to the equivalent non existent c/c++ alternative" Until http://code.google.com/p/qizmt/ wins the TeraSort benchmark or when Google open sources Google MapReduce, I am sure if someone coded hadoop in assembler it would trump the theoretical hadoop written in c as well.
-
Re: Why hadoop is written in java?michael j pan 2010-10-13, 03:13
It would be good to recognize that Hadoop is a Java implementation of
a MapReduce framework. There are other MapReduce framework implementations out there, written in other languages - for C/C++, Sector/Sphere http://sector.sourceforge.net/ - for Python/Erlang, Disco http://discoproject.org/ I'm sure there are others. To respond to Ricky (below), I doubt that Google (who wrote the MapReduce paper), implemented their MapReduce in Java. So the question may be, why is Hadoop (which implements MapReduce as described in that paper) the most popular MapReduce framework in the wild, even though it was not the first, nor the most efficient? Cheers Mike On Wed, Oct 13, 2010 at 00:57, Ricky Ho <[EMAIL PROTECTED]> wrote: > Is it easier if we change the question to : "Why does Java people create Hadoop > before C++ people ?" > I agree that for framework like Hadoop, execution efficiency is at a higher > priority than developer productivity. And if the user can use any language to > write map and reduce function (like Hadoop streaming), then we should use the > most efficient language to write the core framework. > But again, don't forget the dynamics. It is not about which language is the > most efficient. It is about within the group of parallel computing experts who > is willing to spend time in Open source, what language are they more familiar > with (or passionate about).
-
Re: Why hadoop is written in java?Steve Loughran 2010-10-13, 11:42
On 13/10/10 04:13, michael j pan wrote:
> So the > question may be, why is Hadoop (which implements MapReduce as > described in that paper) the most popular MapReduce framework in the > wild, even though it was not the first, nor the most efficient? -good engineering effort at Y! and others means that it scales to double digits of petabytes, thousands of nodes, so for everyone else you know you don't hit the limits -good community evolving it -regular release schedule -good documentation/books -evolving set of tools near it: hive, pig, hbase, cassandra, mahout, etc.
-
Re: Why hadoop is written in java?Scott Carey 2010-10-23, 02:07
On Oct 11, 2010, at 8:18 PM, Dhruba Borthakur wrote: > I agree with others in this list that Java provides faster software > development, the IO cost in Java is practically the same as in C/C++, etc. > In short, most pieces of distributed software can be written in Java without > any performance hiccups, as long as it is only system metadata that is > handled by Java. > > One problem is when data-flow has to occur in Java. Each record that is read > from the storage has to be de-serialized, uncompressed and then processed. > This processing could be very slow in Java compared to when written in other > languages, especially because of the creation/destruction of too many > objects. It would have been nice if the map/reduce task could have been > written in C/C++, or better still, if the sorting inside the MR framework > could occur in C/C++. > > thanks, > dhruba There are many places left in Hadoop's Design to improve on the performance of these actions. The use of InputStream/OutputStream is not optimal at the record level in the intermediate data, for example. Essentially, the fact that Writable interface cause per-tuple access of the slow InputStream/OutputStream API's is a problem. As a guideline, never read from, or write to, InputStream/OutputStream in chunks less than 128 bytes, and optimally go for 512 bytes+. Some bits of Hadoop have done this optimization (TextInputFormat) and seen gains. As for memory consumption of Java itself, tuning the JVM parameters can go a long way, especially making sure that -XX:MaxNewSize is set so that in larger heaps, the default 1/3 of the heap is not consumed by the young generation. And the most recent JVM has enabled the -XX:+DoEscapeAnalysis flag to elide object allocation in several cases. Both that flag and another memory saving flag, -XX:+UseCompressedOops will be defaults in the next major Hotspot update. OpenJDK's Java 7 has two new sorting routines that improve Java sort performance by 20% to 100% too. Hadoop could implement these algorithms (TimSort and Dual Pivot Quicksort). I've seen a 10% performance gain in non-hadoop applications when experimenting with the latest OpenJDK, which does register allocation and array bounds check elimination better than the current JRE 6. In short, there is a lot left to do to Hadoop IMO to improve its performance, and Java is a great language for being able to safely do larger scale refactorings and evolve a product. And the JVM itself is continuing to improve. > > On Mon, Oct 11, 2010 at 4:50 PM, helwr <[EMAIL PROTECTED]> wrote: > >> >> Check out this thread: >> https://www.quora.com/Why-was-Hadoop-written-in-Java >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html >> Sent from the Hadoop lucene-users mailing list archive at Nabble.com. >> > > > > -- > Connect to me at http://www.facebook.com/dhruba
-
Re: Why hadoop is written in java?Scott Carey 2010-10-23, 02:09
On Oct 11, 2010, at 9:20 PM, Chris Dyer wrote: > The second issue is that C/C++ bindings are common and widely used > from many languages, but it is not generally possible to interface > directly with Java (or Java libraries) from another language, unless > that language is also built on top of the JVM. This is a very > unfortunate because many problems that would be quite naturally > expressed in MapReduce are better solved in non-JVM languages. > Scala is a more natural fit for the M/R paradigm since it is a strongly typed functional language, and it runs on the JVM. Unlike other non-Java functional languages on the JVM, it is as fast as Java. Perhaps someone will create a Scala API for Hadoop.
-
Re: Why hadoop is written in java?Steve Loughran 2010-10-25, 10:03
On 23/10/10 03:09, Scott Carey wrote:
> Scala is a more natural fit for the M/R paradigm since it is a strongly typed functional language, and it runs on the JVM. Unlike other non-Java functional languages on the JVM, it is as fast as Java. Perhaps someone will create a Scala API for Hadoop. +1; would love to see a bridge package produced for Hadoop here
-
Re: Why hadoop is written in java?baloodevil 2011-03-16, 17:43
See this for comment on java handling numeric calculations like sparse
matrices... http://acs.lbl.gov/software/colt/ -- View this message in context: http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p2688781.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
-
Re: Why hadoop is written in java?Ted Dunning 2011-03-16, 20:57
Note that that comment is now 7 years old.
See Mahout for a more modern take on numerics using Hadoop (and other tools) for scalable machine learning and data mining. On Wed, Mar 16, 2011 at 10:43 AM, baloodevil <[EMAIL PROTECTED]> wrote: > See this for comment on java handling numeric calculations like sparse > matrices... > http://acs.lbl.gov/software/colt/ > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p2688781.html > Sent from the Hadoop lucene-users mailing list archive at Nabble.com. > |