Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Questions with regard to scheduling of map and reduce tasks

Copy link to this message
Re: Questions with regard to scheduling of map and reduce tasks

> 0.23.1 with Pig 0.10.0 on top.


> How is the preemption suppose to work? Is a single reducer suppose to
> be preempted or will a batch of reducers be preempted.
A batch of reducers. Enough reducers will be killed to accommodate any/all pending map-tasks.

> Also, when you
> say preemption, do you mean that the current execution of a reducer is
> actually paused and resumed again later. Or, does preemption mean that
> the reducer's container is discarded and must be started again from
> scratch?

No, by preempted, I mean that the current reduce tasks are killed. And because MapReduce tolerates arbitrary number of killed task-attempts (as opposed to failed task-attempts), this is okay. So yes, the reducers when they get rescheduled will start all-over again.

> Do you know of any doc on the specifics of task scheduling? Would you
> say that the example I gave is in line with how scheduling is
> intended?

We don't have docs on task-level scheduling, but you can look at RMContainerAllocator.java and related classes in MRAppMaster (i.e. hadoop-mapreduce-client-app/ module) for understanding this.

And no, like I mentioned before scheduling isn't random, but maps first, and a slow reduce ramp-up as reducers finish.

> FYI: the starvation issue is a known bug (https://issues.apache.org/jira/browse/MAPREDUCE-4299).
Mistook that you were using capacity-scheduler. There were other such bugs in both the Fifo and capacity-schedulers which got fixed (not sure of fixed-version). We've tested Capacity-scheduler a lot more if you pick up the latest version - 0.23.2/branch-0.23


+Vinod Kumar Vavilapalli
Hortonworks Inc.