|
|
+
Michael Segel 2012-12-01, 17:57
+
Jean-Marc Spaggiari 2012-12-07, 19:51
+
Tsuyoshi OZAWA 2012-12-08, 07:59
-
Re: Assigning reduce tasks to specific nodesHiroyuki Yamada 2012-12-01, 10:41
Thank you all for the comments.
>you ought to make sure your scheduler also does non-strict scheduling of data local tasks for jobs that don't require such strictness I just want to make sure one thing. If I write my own scheduler, is it possible to do "strict" scheduling ? Thanks On Thu, Nov 29, 2012 at 1:56 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > Look at locality delay parameter > > Sent from my iPhone > > On Nov 28, 2012, at 8:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> None of the current schedulers are "strict" in the sense of "do not >> schedule the task if such a tasktracker is not available". That has >> never been a requirement for Map/Reduce programs and nor should be. >> >> I feel if you want some code to run individually on all nodes for >> whatever reason, you may as well ssh into each one and start it >> manually with appropriate host-based parameters, etc.. and then >> aggregate their results. >> >> Note that even if you get down to writing a scheduler for this (which >> I don't think is a good idea, but anyway), you ought to make sure your >> scheduler also does non-strict scheduling of data local tasks for jobs >> that don't require such strictness - in order for them to complete >> quickly than wait around for scheduling in a fixed manner. >> >> On Thu, Nov 29, 2012 at 6:00 AM, Hiroyuki Yamada <[EMAIL PROTECTED]> wrote: >>> Thank you all for the comments and advices. >>> >>> I know it is not recommended to assigning mapper locations by myself. >>> But There needs to be one mapper running in each node in some cases, >>> so I need a strict way to do it. >>> >>> So, locations is taken care of by JobTracker(scheduler), but it is not strict. >>> And, the only way to do it strictly is making a own scheduler, right ? >>> >>> I have checked the source and I am not sure where to modify to do it. >>> What I understand is FairScheduler and others are for scheduling >>> multiple jobs. Is this right ? >>> What I want to do is scheduling tasks in one job. >>> This can be achieved by FairScheduler and others ? >>> >>> Regards, >>> Hiroyuki >>> >>> On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel >>> <[EMAIL PROTECTED]> wrote: >>>> Mappers? Uhm... yes you can do it. >>>> Yes it is non-trivial. >>>> Yes, it is not recommended. >>>> >>>> I think we talk a bit about this in an InfoQ article written by Boris >>>> Lublinsky. >>>> >>>> Its kind of wild when your entire cluster map goes red in ganglia... :-) >>>> >>>> >>>> On Nov 28, 2012, at 2:41 AM, Harsh J <[EMAIL PROTECTED]> wrote: >>>> >>>> Hi, >>>> >>>> Mapper scheduling is indeed influenced by the getLocations() returned >>>> results of the InputSplit. >>>> >>>> The map task itself does not care about deserializing the location >>>> information, as it is of no use to it. The location information is vital to >>>> the scheduler (or in 0.20.2, the JobTracker), where it is sent to directly >>>> when a job is submitted. The locations are used pretty well here. >>>> >>>> You should be able to control (or rather, influence) mapper placement by >>>> working with the InputSplits, but not strictly so, cause in the end its up >>>> to your MR scheduler to do data local or non data local assignments. >>>> >>>> >>>> On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada <[EMAIL PROTECTED]> >>>> wrote: >>>>> >>>>> Hi Harsh, >>>>> >>>>> Thank you for the information. >>>>> I understand the current circumstances. >>>>> >>>>> How about for mappers ? >>>>> As far as I tested, location information in InputSplit is ignored in >>>>> 0.20.2, >>>>> so there seems no easy way for assigning mappers to specific nodes. >>>>> (I before checked the source and noticed that >>>>> location information is not restored when deserializing the InputSplit >>>>> instance.) >>>>> >>>>> Thanks, >>>>> Hiroyuki >>>>> >>>>> On Wed, Nov 28, 2012 at 2:08 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>>>>> This is not supported/available currently even in MR2, but take a look >>>>>> at >>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-199. |