Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Basic question on how reducer works


Copy link to this message
-
Re: Basic question on how reducer works
Robert,

On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote:

> Hi,
>
> I have some questions related to basic functionality in Hadoop.
>
> 1. When a Mapper process the intermediate output data, how it knows how many partitions to do(how many reducers will be) and how much data to go in each  partition for each reducer ?
>
> 2. A JobTracker when assigns a task to a reducer, it will also specify the locations of intermediate output data where it should retrieve it right ? But how a reducer will know from each remote location with intermediate output what portion it has to retrieve only ?

To add to Harsh's comment. Essentially the TT *knows* where the output of a given map-id/reduce-id pair is present via an output-file/index-file combination.

Arun

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/