|
Vasco Visser
2012-08-30, 17:41
Vinod Kumar Vavilapalli
2012-08-30, 18:19
Vasco Visser
2012-08-30, 23:38
祝美祺
2012-08-31, 02:07
Vinod Kumar Vavilapalli
2012-08-31, 03:51
Vasco Visser
2012-08-31, 11:17
Vinod Kumar Vavilapalli
2012-08-31, 22:59
Vasco Visser
2012-09-02, 16:41
|
-
Questions with regard to scheduling of map and reduce tasksVasco Visser 2012-08-30, 17:41
Hi,
When running a job with more reducers than containers available in the cluster all reducers get scheduled, leaving no containers available for the mappers to be scheduled. The result is starvation and the job never finishes. Is this to be considered a bug or is it expected behavior? The workaround is to limit the number of reducers to less than the number of containers available. Also, it seems that from the combined pool of pending map and reduce tasks, randomly tasks are picked and scheduled. This causes less than optimal behavior. For example, I run a task with 500 mappers and 30 reducers (my cluster has only 16 machines, two containters per machine (duo core machines)). What I observe is that half way through the job all reduce tasks are scheduled, leaving only one container for 200+ map tasks. Again, is this expected behavior? If so, what is the idea behind it? And, are the map and reduce task indeed randomly scheduled or does it only look like they are? Any advice is welcome. Regards, Vasco
-
Re: Questions with regard to scheduling of map and reduce tasksVinod Kumar Vavilapalli 2012-08-30, 18:19
Since you mentioned containers, I assume you are using hadoop 2.0.*. Replies inline. > When running a job with more reducers than containers available in the > cluster all reducers get scheduled, leaving no containers available > for the mappers to be scheduled. The result is starvation and the job > never finishes. Is this to be considered a bug or is it expected > behavior? The workaround is to limit the number of reducers to less > than the number of containers available. No, you don't need to limit reducers yourselves, MR ApplicationMaster is smart enough to figure out available cluster/queue capacity and schedule maps/reduces accordingly. If ever it runs into a situation where it has outstanding maps but reduces happen to occupy all available resources, it will preempt reduces and start running maps. > Also, it seems that from the combined pool of pending map and reduce > tasks, randomly tasks are picked and scheduled. This causes less than > optimal behavior. For example, I run a task with 500 mappers and 30 > reducers (my cluster has only 16 machines, two containters per machine > (duo core machines)). What I observe is that half way through the job > all reduce tasks are scheduled, leaving only one container for 200+ > map tasks. Again, is this expected behavior? If so, what is the idea > behind it? And, are the map and reduce task indeed randomly scheduled > or does it only look like they are? No, again MR ApplicationMaster is smart and the scheduling isn't random. It runs maps first, and slowly ramps up reduces as maps finish. HTH +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/
-
Re: Questions with regard to scheduling of map and reduce tasksVasco Visser 2012-08-30, 23:38
FYI: the starvation issue is a known bug
(https://issues.apache.org/jira/browse/MAPREDUCE-4299). Still interested in answers to the questions regarding the scheduling though. If anyone can share some info on that it is much appreciated. regards, Vasco
-
Re: Questions with regard to scheduling of map and reduce tasks祝美祺 2012-08-31, 02:07
Umsubscribe
2012/8/31 Vasco Visser <[EMAIL PROTECTED]> > FYI: the starvation issue is a known bug > (https://issues.apache.org/jira/browse/MAPREDUCE-4299). > > Still interested in answers to the questions regarding the scheduling > though. If anyone can share some info on that it is much appreciated. > > regards, Vasco >
-
Re: Questions with regard to scheduling of map and reduce tasksVinod Kumar Vavilapalli 2012-08-31, 03:51
> 0.23.1 with Pig 0.10.0 on top. Ok. > How is the preemption suppose to work? Is a single reducer suppose to > be preempted or will a batch of reducers be preempted. A batch of reducers. Enough reducers will be killed to accommodate any/all pending map-tasks. > Also, when you > say preemption, do you mean that the current execution of a reducer is > actually paused and resumed again later. Or, does preemption mean that > the reducer's container is discarded and must be started again from > scratch? No, by preempted, I mean that the current reduce tasks are killed. And because MapReduce tolerates arbitrary number of killed task-attempts (as opposed to failed task-attempts), this is okay. So yes, the reducers when they get rescheduled will start all-over again. > Do you know of any doc on the specifics of task scheduling? Would you > say that the example I gave is in line with how scheduling is > intended? We don't have docs on task-level scheduling, but you can look at RMContainerAllocator.java and related classes in MRAppMaster (i.e. hadoop-mapreduce-client-app/ module) for understanding this. And no, like I mentioned before scheduling isn't random, but maps first, and a slow reduce ramp-up as reducers finish. > FYI: the starvation issue is a known bug (https://issues.apache.org/jira/browse/MAPREDUCE-4299). Mistook that you were using capacity-scheduler. There were other such bugs in both the Fifo and capacity-schedulers which got fixed (not sure of fixed-version). We've tested Capacity-scheduler a lot more if you pick up the latest version - 0.23.2/branch-0.23 HTH +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/
-
Re: Questions with regard to scheduling of map and reduce tasksVasco Visser 2012-08-31, 11:17
Thanks again for the reply, it is becoming clear.
While on the subject of going over the code, do you know by any chance where the piece of code is that creates resource requests according to locations of HDFS blocks? I am looking for that, but the protocol buffer stuff makes it difficult for me to understand what is going on. regards, Vasco On Fri, Aug 31, 2012 at 5:51 AM, Vinod Kumar Vavilapalli <[EMAIL PROTECTED]> wrote: > > 0.23.1 with Pig 0.10.0 on top. > > > Ok. > > How is the preemption suppose to work? Is a single reducer suppose to > be preempted or will a batch of reducers be preempted. > > > > A batch of reducers. Enough reducers will be killed to accommodate any/all > pending map-tasks. > > Also, when you > say preemption, do you mean that the current execution of a reducer is > actually paused and resumed again later. Or, does preemption mean that > the reducer's container is discarded and must be started again from > scratch? > > > No, by preempted, I mean that the current reduce tasks are killed. And > because MapReduce tolerates arbitrary number of killed task-attempts (as > opposed to failed task-attempts), this is okay. So yes, the reducers when > they get rescheduled will start all-over again. > > Do you know of any doc on the specifics of task scheduling? Would you > say that the example I gave is in line with how scheduling is > intended? > > > We don't have docs on task-level scheduling, but you can look at > RMContainerAllocator.java and related classes in MRAppMaster (i.e. > hadoop-mapreduce-client-app/ module) for understanding this. > > And no, like I mentioned before scheduling isn't random, but maps first, and > a slow reduce ramp-up as reducers finish. > > FYI: the starvation issue is a known bug > (https://issues.apache.org/jira/browse/MAPREDUCE-4299). > > > Mistook that you were using capacity-scheduler. There were other such bugs > in both the Fifo and capacity-schedulers which got fixed (not sure of > fixed-version). We've tested Capacity-scheduler a lot more if you pick up > the latest version - 0.23.2/branch-0.23 > > HTH > > +Vinod Kumar Vavilapalli > Hortonworks Inc. > http://hortonworks.com/
-
Re: Questions with regard to scheduling of map and reduce tasksVinod Kumar Vavilapalli 2012-08-31, 22:59
You don't need to touch the code related protocol-buffer records at all as there are java-native interfaces for everything, for e.g. - org.apache.hadoop.yarn.api.AMRMProtocol. Regarding your question - The JobClient first obtains the locations of DFS blocks via the InputFormat.getSplits() and uploads the accumulated information into a split file, see Job.submitInternal() -> JobSubmitter.writeSplits() -> ... The MR AM then downloads and reads the split file and reconstructs the splits-information and creates TaskAttempts(TAs) which then use it request containers. See MRAppMaster code: JobImpl.InitTransition for how TAs are created with host information. HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Aug 31, 2012, at 4:17 AM, Vasco Visser wrote: > Thanks again for the reply, it is becoming clear. > > While on the subject of going over the code, do you know by any chance > where the piece of code is that creates resource requests according to > locations of HDFS blocks? I am looking for that, but the protocol > buffer stuff makes it difficult for me to understand what is going on. > > regards, Vasco >
-
Re: Questions with regard to scheduling of map and reduce tasksVasco Visser 2012-09-02, 16:41
I am now running 2.1.0 branch where the fifo starvation is solved. FYI
the behavior of task scheduling in this branch is as follows. It begins with all containers scheduled to mappers. Pretty quickly reducers are starting to be scheduled. From time to time more containers are given to reducers, until about 50% of available containers are reducers. It stays 50-50 until all mappers are scheduled. Only then the proportion of containers allocated to reducers is increased to > 50%. |