Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Why hadoop is written in java?


Copy link to this message
-
Re: Why hadoop is written in java?
On Tue, Oct 12, 2010 at 12:20 AM, Chris Dyer <[EMAIL PROTECTED]> wrote:
> The Java memory overhead is a quite serious problem, and a legitimate
> and serious criticism of Hadoop. For MapReduce applications, it is
> often (although not always) possible to improve performance by doing
> more work in memory (e.g., using combiners and the like) before
> emitting data. Thus, the more memory available to your application,
> the more efficient it runs. Therefore, if you have a framework that
> locks up 500mb rather than 50mb, you systematically get less
> performance out of your cluster.
>
> The second issue is that C/C++ bindings are common and widely used
> from many languages, but it is not generally possible to interface
> directly with Java (or Java libraries) from another language, unless
> that language is also built on top of the JVM. This is a very
> unfortunate because many problems that would be quite naturally
> expressed in MapReduce are better solved in non-JVM languages.
>
> But, Java is what we have, and it works well enough for many things.
>
> On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <[EMAIL PROTECTED]> wrote:
>> I agree with others in this list that Java provides faster software
>> development, the IO cost in Java is practically the same as in C/C++, etc.
>> In short, most pieces of distributed software can be written in Java without
>> any performance hiccups, as long as it is only system metadata that is
>> handled by Java.
>>
>> One problem is when data-flow has to occur in Java. Each record that is read
>> from the storage has to be de-serialized, uncompressed and then processed.
>> This processing could be very slow in Java compared to when written in other
>> languages, especially because of the creation/destruction of too many
>> objects.  It would have been nice if the map/reduce task could have been
>> written in C/C++, or better still, if the sorting inside the MR framework
>> could occur in C/C++.
>>
>> thanks,
>> dhruba
>>
>> On Mon, Oct 11, 2010 at 4:50 PM, helwr <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Check out this thread:
>>> https://www.quora.com/Why-was-Hadoop-written-in-Java
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html
>>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> Connect to me at http://www.facebook.com/dhruba
>>
>

Hate to say it this way... but yet another "java is slow compared to
the equivalent non existent c/c++ alternative"
Until http://code.google.com/p/qizmt/ wins the TeraSort benchmark or
when Google open sources Google MapReduce, I am sure if someone coded
hadoop in assembler it would trump the theoretical hadoop written in c
as well.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB