|
|
-
Re: Assigning reduce tasks to specific nodesHarsh J 2012-12-01, 12:27
Yes, scheduling is done on a Tasktracker heartbeat basis, so it is
certainly possible to do absolutely strict scheduling (although be aware of the condition of failing/unavailable tasktrackers). Mohit's suggestion is somewhat like what you desire (delay scheduling in fair scheduler config) - but setting it to very high values is bad to do (for jobs that don't need this). On Sat, Dec 1, 2012 at 4:11 PM, Hiroyuki Yamada <[EMAIL PROTECTED]> wrote: > Thank you all for the comments. > >>you ought to make sure your scheduler also does non-strict scheduling of data local tasks for jobs > that don't require such strictness > > I just want to make sure one thing. > If I write my own scheduler, is it possible to do "strict" scheduling ? > > Thanks > > On Thu, Nov 29, 2012 at 1:56 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: >> Look at locality delay parameter >> >> Sent from my iPhone >> >> On Nov 28, 2012, at 8:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >>> None of the current schedulers are "strict" in the sense of "do not >>> schedule the task if such a tasktracker is not available". That has >>> never been a requirement for Map/Reduce programs and nor should be. >>> >>> I feel if you want some code to run individually on all nodes for >>> whatever reason, you may as well ssh into each one and start it >>> manually with appropriate host-based parameters, etc.. and then >>> aggregate their results. >>> >>> Note that even if you get down to writing a scheduler for this (which >>> I don't think is a good idea, but anyway), you ought to make sure your >>> scheduler also does non-strict scheduling of data local tasks for jobs >>> that don't require such strictness - in order for them to complete >>> quickly than wait around for scheduling in a fixed manner. >>> >>> On Thu, Nov 29, 2012 at 6:00 AM, Hiroyuki Yamada <[EMAIL PROTECTED]> wrote: >>>> Thank you all for the comments and advices. >>>> >>>> I know it is not recommended to assigning mapper locations by myself. >>>> But There needs to be one mapper running in each node in some cases, >>>> so I need a strict way to do it. >>>> >>>> So, locations is taken care of by JobTracker(scheduler), but it is not strict. >>>> And, the only way to do it strictly is making a own scheduler, right ? >>>> >>>> I have checked the source and I am not sure where to modify to do it. >>>> What I understand is FairScheduler and others are for scheduling >>>> multiple jobs. Is this right ? >>>> What I want to do is scheduling tasks in one job. >>>> This can be achieved by FairScheduler and others ? >>>> >>>> Regards, >>>> Hiroyuki >>>> >>>> On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel >>>> <[EMAIL PROTECTED]> wrote: >>>>> Mappers? Uhm... yes you can do it. >>>>> Yes it is non-trivial. >>>>> Yes, it is not recommended. >>>>> >>>>> I think we talk a bit about this in an InfoQ article written by Boris >>>>> Lublinsky. >>>>> >>>>> Its kind of wild when your entire cluster map goes red in ganglia... :-) >>>>> >>>>> >>>>> On Nov 28, 2012, at 2:41 AM, Harsh J <[EMAIL PROTECTED]> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Mapper scheduling is indeed influenced by the getLocations() returned >>>>> results of the InputSplit. >>>>> >>>>> The map task itself does not care about deserializing the location >>>>> information, as it is of no use to it. The location information is vital to >>>>> the scheduler (or in 0.20.2, the JobTracker), where it is sent to directly >>>>> when a job is submitted. The locations are used pretty well here. >>>>> >>>>> You should be able to control (or rather, influence) mapper placement by >>>>> working with the InputSplits, but not strictly so, cause in the end its up >>>>> to your MR scheduler to do data local or non data local assignments. >>>>> >>>>> >>>>> On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada <[EMAIL PROTECTED]> >>>>> wrote: >>>>>> >>>>>> Hi Harsh, >>>>>> >>>>>> Thank you for the information. >>>>>> I understand the current circumstances. >>>>>> >>>>>> How about for mappers ? Harsh J |