Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Questions with regard to scheduling of map and reduce tasks


+
Vasco Visser 2012-08-30, 17:41
+
Vinod Kumar Vavilapalli 2012-08-30, 18:19
+
Vasco Visser 2012-08-30, 23:38
+
祝美祺 2012-08-31, 02:07
+
Vinod Kumar Vavilapalli 2012-08-31, 03:51
Copy link to this message
-
Re: Questions with regard to scheduling of map and reduce tasks
Thanks again for the reply, it is becoming clear.

While on the subject of going over the code, do you know by any chance
where the piece of code is that creates resource requests according to
locations of HDFS blocks? I am looking for that, but the protocol
buffer stuff makes it difficult for me to understand what is going on.

regards, Vasco
On Fri, Aug 31, 2012 at 5:51 AM, Vinod Kumar Vavilapalli
<[EMAIL PROTECTED]> wrote:
>
> 0.23.1 with Pig 0.10.0 on top.
>
>
> Ok.
>
> How is the preemption suppose to work? Is a single reducer suppose to
> be preempted or will a batch of reducers be preempted.
>
>
>
> A batch of reducers. Enough reducers will be killed to accommodate any/all
> pending map-tasks.
>
> Also, when you
> say preemption, do you mean that the current execution of a reducer is
> actually paused and resumed again later. Or, does preemption mean that
> the reducer's container is discarded and must be started again from
> scratch?
>
>
> No, by preempted, I mean that the current reduce tasks are killed. And
> because MapReduce tolerates arbitrary number of killed task-attempts (as
> opposed to failed task-attempts), this is okay. So yes, the reducers when
> they get rescheduled will start all-over again.
>
> Do you know of any doc on the specifics of task scheduling? Would you
> say that the example I gave is in line with how scheduling is
> intended?
>
>
> We don't have docs on task-level scheduling, but you can look at
> RMContainerAllocator.java and related classes in MRAppMaster (i.e.
> hadoop-mapreduce-client-app/ module) for understanding this.
>
> And no, like I mentioned before scheduling isn't random, but maps first, and
> a slow reduce ramp-up as reducers finish.
>
> FYI: the starvation issue is a known bug
> (https://issues.apache.org/jira/browse/MAPREDUCE-4299).
>
>
> Mistook that you were using capacity-scheduler. There were other such bugs
> in both the Fifo and capacity-schedulers which got fixed (not sure of
> fixed-version). We've tested Capacity-scheduler a lot more if you pick up
> the latest version - 0.23.2/branch-0.23
>
> HTH
>
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
+
Vinod Kumar Vavilapalli 2012-08-31, 22:59
+
Vasco Visser 2012-09-02, 16:41
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB