Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Shuffle phase: fine-grained control of data flow


Copy link to this message
-
Re: Shuffle phase: fine-grained control of data flow
Hi Jiwei,

In trunk (i.e. MR2), the completion events selection + scheduling
logic lies under class EventFetcher's getMapCompletionEvents() method,
as viewable at http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java?view=markup

This EventFetcher thread is used by the Shuffle (reduce package)
class, to continually do the shuffling. The Shuffle class is then
itself used by the ReduceTask class (look in mapred package of same
maven module).

I guess you can start there, to see if a better selection+scheduling
logic would yield better results.

On Wed, Nov 7, 2012 at 12:26 PM, Jiwei Li <[EMAIL PROTECTED]> wrote:
> Dear all,
>
> For jobs like Sort, massive amounts of network traffic happen during
> shuffle phase. The simple mechanism in Hadoop 1.0.4 to choose reduce nodes
> does not help reduce network traffic. If JobTracker is fully aware of
> locations of every map output, why not take advantage of this topology
> knowledge?
>
> So, is there anyone who knows where to develop such codes upon? Many thanks.
>
> Regards.
> --
> Jiwei

--
Harsh J