-Re: the part of the intermediate output fed to a reducer
Harsh J 2013-03-23, 19:56
On Sun, Mar 24, 2013 at 12:00 AM, preethi ganeshan
<[EMAIL PROTECTED]> wrote:
> Hey all,
> I am working on project that schedules data local reduce tasks.
Great, are you planning to contribute it upstream too? See
https://issues.apache.org/jira/browse/MAPREDUCE-199. I'm also hoping
you're working on trunk and not the maintenance branch branch-1, which
is very outdated with where MR is today.
> However , i wanted to know if there is a way using MapTask.java to keep track of the
> inputs and size of the input to every reducer. In other words what code do
> i add to get the size of the intermediate output that is fed to a reduce
> task before a reduce task begins.
Change the thinking here a bit: A map does not feed a reduce (i.e. its
not a push). A reduce consumes a map output after its completion (they
map task JVM may terminate for all it cares). Upon a map's completion,
its counters are available at the central (i.e. the ApplicationMaster)
which the reduce task can poll for sizes (it may already be doing