Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> One output file per node


Copy link to this message
-
Re: One output file per node
if you have strong data locality demands, then try
http://peregrine_mapreduce.bitbucket.org/ Its 2x faster then hadoop for
multipass job types. It has also very fast node recovery. I plan to do
this for hdfs, concept is similar to "virtual nodes".

Its not hadoop or HDFS compatible and it has no ecosystem. I am not sure
if it is still under development, no commits in last months.

https://bitbucket.org/burtonator/peregrine/commits