if you have strong data locality demands, then try
http://peregrine_mapreduce.bitbucket.org/ Its 2x faster then hadoop for
multipass job types. It has also very fast node recovery. I plan to do
this for hdfs, concept is similar to "virtual nodes".
Its not hadoop or HDFS compatible and it has no ecosystem. I am not sure
if it is still under development, no commits in last months.