Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - MR job scheduler


Copy link to this message
-
Re: MR job scheduler
bharath vissapragada 2009-08-21, 04:20
OK i'll be a bit more specific ,

Suppose map outputs 100 different keys .

Consider a key "K" whose correspoding values may be on N diff datanodes.
Consider a datanode "D" which have maximum number of values . So instead of
moving the values on "D"
to other systems , it is useful to bring in the values from other datanodes
to "D" to minimize the data movement and
also the delay. Similar is the case with All the other keys . How does the
scheduler take care of this ?
2009/8/21 zjffdu <[EMAIL PROTECTED]>

> Add some detials:
>
> 1. #map is determined by the block size and InputFormat (whether you can
> want to split or not split)
>
> 2. The default scheduler for Hadoop is FIFO, and the Fair Scheduler and
> Capacity Scheduler are other two options as I know.  JobTracker has the
> scheduler.
>
> 3. Once the map task is done, it will tell its own tasktracker, and the
> tasktracker will tell jobtracker, so jobtracker manage all the tasks, and
> it
> will decide how to and when to start the reduce task
>
>
>
> -----Original Message-----
> From: Arun C Murthy [mailto:[EMAIL PROTECTED]]
> Sent: 2009年8月20日 11:41
> To: [EMAIL PROTECTED]
> Subject: Re: MR job scheduler
>
>
> On Aug 20, 2009, at 9:00 AM, bharath vissapragada wrote:
>
> > Hi all,
> >
> > Can anyone tell me how the MR scheduler schedule the MR jobs?
> > How does it decide where t create MAP tasks and how many to create.
> > Once the MAP tasks are over how does it decide to move the keys to the
> > reducer efficiently(minimizing the data movement across the network).
> > Is there any doc available which describes this scheduling process
> > quite
> > efficiently
> >
>
> The #maps is decided by the application. The scheduler decides where
> to execute them.
>
> Once the map is done, the reduce tasks connect to the tasktracker (on
> the node where the map-task executed) and copies the entire output
> over http.
>
> Arun
>
>