Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # dev >> the part of the intermediate output fed to a reducer


+
preethi ganeshan 2013-03-23, 18:30
Copy link to this message
-
Re: the part of the intermediate output fed to a reducer
Hi,

On Sun, Mar 24, 2013 at 12:00 AM, preethi ganeshan
<[EMAIL PROTECTED]> wrote:
> Hey all,
> I am working on project that schedules data local reduce tasks.

Great, are you planning to contribute it upstream too? See
https://issues.apache.org/jira/browse/MAPREDUCE-199. I'm also hoping
you're working on trunk and not the maintenance branch branch-1, which
is very outdated with where MR is today.

> However , i wanted to know if there is a way using MapTask.java to keep track of the
> inputs and size of the input to every reducer. In other words what code do
> i add to get the size of the intermediate output that is fed to a reduce
> task before a reduce task begins.

Change the thinking here a bit: A map does not feed a reduce (i.e. its
not a push). A reduce consumes a map output after its completion (they
map task JVM may terminate for all it cares). Upon a map's completion,
its counters are available at the central (i.e. the ApplicationMaster)
which the reduce task can poll for sizes (it may already be doing
this).

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB