Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - mapper and reducer scheduling


Copy link to this message
-
Re: mapper and reducer scheduling
Hemanth Yamijala 2010-11-01, 04:01
Hi,

On Mon, Nov 1, 2010 at 9:13 AM, He Chen <[EMAIL PROTECTED]> wrote:
> If you use the default scheduler of hadoop 0.20.2 or higher. The
> jobQueueScheduler will take the data locality into account.

This is true irrespective of the scheduler in use. Other schedulers
currently add a layer to decide which job to pick up first based on
constraints they choose to satisfy - like fairness, queue capacities
etc. Once a job is picked up, the logic for picking up a task within
the job is currently in framework code that all schedulers use.

> That means when
> a heart beat from TT arrives, the JT will first check a cache which is a map
> of node and data-local tasks this node has.  The JT will assign node local
> task first, then the rack local, non-local, recover and speculative tasks if
> they have default priorities.
>
> If a TT get a non-local task, it will query the nodes which have the data
> and finish this task, you can also decide to keep those fetched data on this
> TT or not by configuring the Hadoop mapred-site.xml file.
>
> BTW, even TT get a data local task, it may also ask other data owner (if you
> have more than one replica)for data to accelerate the process. (??? my
> understanding, any one can confirm)

Not that I am aware of. The task's input location is used directly to
read the data.

Thanks
Hemanth
>
> Hope this will help.
>
> Chen
>
> On Sun, Oct 31, 2010 at 9:49 PM, Zhenhua Guo <[EMAIL PROTECTED]> wrote:
>
>> Thanks!
>> One more question. Is the input file replicated on each node where a
>> mapper is run? Or just the portion processed by a mapper is
>> transferred?
>>
>> Gerald
>>
>> On Fri, Oct 29, 2010 at 10:11 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> > Hello,
>> >
>> > On Fri, Oct 29, 2010 at 12:45 PM, Jeff Zhang <[EMAIL PROTECTED]> wrote:
>> >> TaskTracker will tell JobTracker how many free slots it has through
>> >> heartbeat. And JobTracker will choose the best tasktracker with the
>> >> consideration of data locality.
>> >
>> > Yes. To add some more, a scheduler is responsible to do assignments of
>> > tasks (based on various stats, including data locality) to proper
>> > tasktrackers. Scheduler.assignTasks(TaskTracker) is used to assign a
>> > TaskTracker its tasks, and the scheduler type is configurable (Some
>> > examples are Eager/FIFO scheduler, Capacity scheduler, etc.).
>> >
>> > This scheduling is done when a heart beat response is to be sent back
>> > to a TaskTracker that called JobTracker.heartbeat(...).
>> >
>> >>
>> >>
>> >> On Thu, Oct 28, 2010 at 2:52 PM, Zhenhua Guo <[EMAIL PROTECTED]> wrote:
>> >>> Hi, all
>> >>>  I wonder how Hadoop schedules mappers and reducers (e.g. consider
>> >>> load balancing, affinity to data?). For example, how to decide on
>> >>> which nodes mappers and reducers are to be executed and when.
>> >>>  Thanks!
>> >>>
>> >>> Gerald
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards
>> >>
>> >> Jeff Zhang
>> >>
>> >
>> >
>> >
>> > --
>> > Harsh J
>> > www.harshj.com
>> >
>>
>