Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Why hadoop is written in java?


Copy link to this message
-
Re: Why hadoop is written in java?
Edward Capriolo 2010-10-13, 01:19
On Tue, Oct 12, 2010 at 12:20 AM, Chris Dyer <[EMAIL PROTECTED]> wrote:
> The Java memory overhead is a quite serious problem, and a legitimate
> and serious criticism of Hadoop. For MapReduce applications, it is
> often (although not always) possible to improve performance by doing
> more work in memory (e.g., using combiners and the like) before
> emitting data. Thus, the more memory available to your application,
> the more efficient it runs. Therefore, if you have a framework that
> locks up 500mb rather than 50mb, you systematically get less
> performance out of your cluster.
>
> The second issue is that C/C++ bindings are common and widely used
> from many languages, but it is not generally possible to interface
> directly with Java (or Java libraries) from another language, unless
> that language is also built on top of the JVM. This is a very
> unfortunate because many problems that would be quite naturally
> expressed in MapReduce are better solved in non-JVM languages.
>
> But, Java is what we have, and it works well enough for many things.
>
> On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote:
>> I agree with others in this list that Java provides faster software
>> development, the IO cost in Java is practically the same as in C/C++, etc.
>> In short, most pieces of distributed software can be written in Java without
>> any performance hiccups, as long as it is only system metadata that is
>> handled by Java.
>>
>> One problem is when data-flow has to occur in Java. Each record that is read
>> from the storage has to be de-serialized, uncompressed and then processed.
>> This processing could be very slow in Java compared to when written in other
>> languages, especially because of the creation/destruction of too many
>> objects.  It would have been nice if the map/reduce task could have been
>> written in C/C++, or better still, if the sorting inside the MR framework
>> could occur in C/C++.
>>
>> thanks,
>> dhruba
>>
>> On Mon, Oct 11, 2010 at 4:50 PM, helwr <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Check out this thread:
>>> https://www.quora.com/Why-was-Hadoop-written-in-Java
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html
>>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> Connect to me at http://www.facebook.com/dhruba
>>
>

Hate to say it this way... but yet another "java is slow compared to
the equivalent non existent c/c++ alternative"
Until http://code.google.com/p/qizmt/ wins the TeraSort benchmark or
when Google open sources Google MapReduce, I am sure if someone coded
hadoop in assembler it would trump the theoretical hadoop written in c
as well.