Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> MR job scheduler

Copy link to this message
Re: MR job scheduler

iam not talkin about the map phase . Iam talking abt the reduce phase which
starts after the map gets finished

The Key "K" iam referring to in my example is  one of the distinct keys wch
map outputs. and its corresponding values may be on any system depending on
where the map phase gets executed. In order to start the reduce phase on a
machine it has to copy all the values corresponding to a particular key over
http. Iam talking abt the way it done .
In that sense am i right?

On Fri, Aug 21, 2009 at 11:53 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:

> On Aug 20, 2009, at 9:20 PM, bharath vissapragada wrote:
>  OK i'll be a bit more specific ,
>> Suppose map outputs 100 different keys .
>> Consider a key "K" whose correspoding values may be on N diff datanodes.
>> Consider a datanode "D" which have maximum number of values . So instead
>> of
>> moving the values on "D"
>> to other systems , it is useful to bring in the values from other
>> datanodes
>> to "D" to minimize the data movement and
>> also the delay. Similar is the case with All the other keys . How does the
>> scheduler take care of this ?
> Map-Reduce doesn't 'bring' values from N datanodes to the map. A map gets a
> single block of data to work with, N-1 other maps get the other N-1 blocks;
> thus multiple maps might get the key K and different values. Eventually the
> output of the maps i.e. K and values <V> land up at one of the reduces
> (based on the Partitioner). Please read some of the widely available
> map-reduce literature for more details.
> Arun