|
|
-
is it possible to ignore http Mapoutput get by feed mapoutput file and index file diirectly to reducer?Ling Kun 2013-02-28, 08:57
Dear Arun C Murthy, Pavan Kulkarni and all.
Hello! I am currently working on optimize Hadoop cluster based on Lustre FS. According to the TeraSort Benchmark, it seems the remote mapoutput copy takes a great part of the total runtime. After search , I saw your discussion half a years ago ( http://search-hadoop.com/m/jj3y46KUwC1 ). I am writing to wonder whether we can make reducer directly read his part of each mapout file based on index file, and merge them together, instead of making each map task generate output for each reduce task. In this way, it seems that not too much inode is needed. @Pavan Kulkarni: no email wa sent by you after Sep. 2012. Could you please kindly share some experience on how to optimize such a kind of FileSystem like lustre? Anyone have similar work experience? Any comment and reply is welcome and appreciate! yours, Ling Kun. * * -- http://www.lingcc.com |