Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Questions with regard to scheduling of map and reduce tasks


Copy link to this message
-
Re: Questions with regard to scheduling of map and reduce tasks
Vasco Visser 2012-08-31, 11:17
Thanks again for the reply, it is becoming clear.

While on the subject of going over the code, do you know by any chance
where the piece of code is that creates resource requests according to
locations of HDFS blocks? I am looking for that, but the protocol
buffer stuff makes it difficult for me to understand what is going on.

regards, Vasco
On Fri, Aug 31, 2012 at 5:51 AM, Vinod Kumar Vavilapalli
<[EMAIL PROTECTED]> wrote:
>
> 0.23.1 with Pig 0.10.0 on top.
>
>
> Ok.
>
> How is the preemption suppose to work? Is a single reducer suppose to
> be preempted or will a batch of reducers be preempted.
>
>
>
> A batch of reducers. Enough reducers will be killed to accommodate any/all
> pending map-tasks.
>
> Also, when you
> say preemption, do you mean that the current execution of a reducer is
> actually paused and resumed again later. Or, does preemption mean that
> the reducer's container is discarded and must be started again from
> scratch?
>
>
> No, by preempted, I mean that the current reduce tasks are killed. And
> because MapReduce tolerates arbitrary number of killed task-attempts (as
> opposed to failed task-attempts), this is okay. So yes, the reducers when
> they get rescheduled will start all-over again.
>
> Do you know of any doc on the specifics of task scheduling? Would you
> say that the example I gave is in line with how scheduling is
> intended?
>
>
> We don't have docs on task-level scheduling, but you can look at
> RMContainerAllocator.java and related classes in MRAppMaster (i.e.
> hadoop-mapreduce-client-app/ module) for understanding this.
>
> And no, like I mentioned before scheduling isn't random, but maps first, and
> a slow reduce ramp-up as reducers finish.
>
> FYI: the starvation issue is a known bug
> (https://issues.apache.org/jira/browse/MAPREDUCE-4299).
>
>
> Mistook that you were using capacity-scheduler. There were other such bugs
> in both the Fifo and capacity-schedulers which got fixed (not sure of
> fixed-version). We've tested Capacity-scheduler a lot more if you pick up
> the latest version - 0.23.2/branch-0.23
>
> HTH
>
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/