Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Questions with regard to scheduling of map and reduce tasks


+
Vasco Visser 2012-08-30, 17:41
+
Vinod Kumar Vavilapalli 2012-08-30, 18:19
+
Vasco Visser 2012-08-30, 23:38
+
祝美祺 2012-08-31, 02:07
Copy link to this message
-
Re: Questions with regard to scheduling of map and reduce tasks

> 0.23.1 with Pig 0.10.0 on top.

Ok.

> How is the preemption suppose to work? Is a single reducer suppose to
> be preempted or will a batch of reducers be preempted.
A batch of reducers. Enough reducers will be killed to accommodate any/all pending map-tasks.

> Also, when you
> say preemption, do you mean that the current execution of a reducer is
> actually paused and resumed again later. Or, does preemption mean that
> the reducer's container is discarded and must be started again from
> scratch?

No, by preempted, I mean that the current reduce tasks are killed. And because MapReduce tolerates arbitrary number of killed task-attempts (as opposed to failed task-attempts), this is okay. So yes, the reducers when they get rescheduled will start all-over again.

> Do you know of any doc on the specifics of task scheduling? Would you
> say that the example I gave is in line with how scheduling is
> intended?

We don't have docs on task-level scheduling, but you can look at RMContainerAllocator.java and related classes in MRAppMaster (i.e. hadoop-mapreduce-client-app/ module) for understanding this.

And no, like I mentioned before scheduling isn't random, but maps first, and a slow reduce ramp-up as reducers finish.

> FYI: the starvation issue is a known bug (https://issues.apache.org/jira/browse/MAPREDUCE-4299).
Mistook that you were using capacity-scheduler. There were other such bugs in both the Fifo and capacity-schedulers which got fixed (not sure of fixed-version). We've tested Capacity-scheduler a lot more if you pick up the latest version - 0.23.2/branch-0.23

HTH

+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/
+
Vasco Visser 2012-08-31, 11:17
+
Vinod Kumar Vavilapalli 2012-08-31, 22:59
+
Vasco Visser 2012-09-02, 16:41
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB