Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # dev - is it possible to ignore http Mapoutput get by feed mapoutput file and index file diirectly to reducer?


Copy link to this message
-
is it possible to ignore http Mapoutput get by feed mapoutput file and index file diirectly to reducer?
Ling Kun 2013-02-28, 08:57
Dear Arun C Murthy, Pavan Kulkarni and all.
     Hello!
     I am currently working on optimize Hadoop cluster based on Lustre FS.
According to the TeraSort Benchmark, it seems the remote mapoutput copy
takes a great part of the total runtime.
   After search , I saw your discussion half a years ago (
http://search-hadoop.com/m/jj3y46KUwC1 ).

     I am writing to wonder whether  we  can make reducer directly read his
part of each mapout file based on index file, and merge them together,
instead of making each map task generate output for each reduce task.

    In this way, it seems that not too much inode is needed.
@Pavan Kulkarni: no email wa sent by you after Sep. 2012. Could you please
kindly share some experience on how to optimize such a kind of  FileSystem
like lustre?

  Anyone have similar work experience?
  Any comment and reply is welcome and appreciate!

yours,
Ling Kun.
*
*
--
http://www.lingcc.com