-is it possible to ignore http Mapoutput get by feed mapoutput file and index file diirectly to reducer?
Dear Arun C Murthy, Pavan Kulkarni and all.
I am currently working on optimize Hadoop cluster based on Lustre FS.
According to the TeraSort Benchmark, it seems the remote mapoutput copy
takes a great part of the total runtime.
After search , I saw your discussion half a years ago (
I am writing to wonder whether we can make reducer directly read his
part of each mapout file based on index file, and merge them together,
instead of making each map task generate output for each reduce task.
In this way, it seems that not too much inode is needed.
@Pavan Kulkarni: no email wa sent by you after Sep. 2012. Could you please
kindly share some experience on how to optimize such a kind of FileSystem
Anyone have similar work experience?
Any comment and reply is welcome and appreciate!