|
|
-
Re: Assigning reduce tasks to specific nodesJean-Marc Spaggiari 2012-12-07, 19:51
Hi Hiroyuki,
Have you made any progress on that? I'm also looking at a way to assign specific Map tasks to specific nodes (I want the Map to run where the data is). JM 2012/12/1, Michael Segel <[EMAIL PROTECTED]>: > I haven't thought about reducers but in terms of mappers you need to > override the data locality so that it thinks that the node where you want to > send the data exists. > Again, not really recommended since it will kill performance unless the > compute time is at least an order of magnitude greater than the time it > takes to transfer the data. > > Really, really don't recommend it.... > > We did it as a hack, just to see if we could do it and get better overall > performance for a specific job. > > > On Dec 1, 2012, at 6:27 AM, Harsh J <[EMAIL PROTECTED]> wrote: > >> Yes, scheduling is done on a Tasktracker heartbeat basis, so it is >> certainly possible to do absolutely strict scheduling (although be >> aware of the condition of failing/unavailable tasktrackers). >> >> Mohit's suggestion is somewhat like what you desire (delay scheduling >> in fair scheduler config) - but setting it to very high values is bad >> to do (for jobs that don't need this). >> >> On Sat, Dec 1, 2012 at 4:11 PM, Hiroyuki Yamada <[EMAIL PROTECTED]> >> wrote: >>> Thank you all for the comments. >>> >>>> you ought to make sure your scheduler also does non-strict scheduling of >>>> data local tasks for jobs >>> that don't require such strictness >>> >>> I just want to make sure one thing. >>> If I write my own scheduler, is it possible to do "strict" scheduling ? >>> >>> Thanks >>> >>> On Thu, Nov 29, 2012 at 1:56 PM, Mohit Anchlia <[EMAIL PROTECTED]> >>> wrote: >>>> Look at locality delay parameter >>>> >>>> Sent from my iPhone >>>> >>>> On Nov 28, 2012, at 8:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>>> >>>>> None of the current schedulers are "strict" in the sense of "do not >>>>> schedule the task if such a tasktracker is not available". That has >>>>> never been a requirement for Map/Reduce programs and nor should be. >>>>> >>>>> I feel if you want some code to run individually on all nodes for >>>>> whatever reason, you may as well ssh into each one and start it >>>>> manually with appropriate host-based parameters, etc.. and then >>>>> aggregate their results. >>>>> >>>>> Note that even if you get down to writing a scheduler for this (which >>>>> I don't think is a good idea, but anyway), you ought to make sure your >>>>> scheduler also does non-strict scheduling of data local tasks for jobs >>>>> that don't require such strictness - in order for them to complete >>>>> quickly than wait around for scheduling in a fixed manner. >>>>> >>>>> On Thu, Nov 29, 2012 at 6:00 AM, Hiroyuki Yamada <[EMAIL PROTECTED]> >>>>> wrote: >>>>>> Thank you all for the comments and advices. >>>>>> >>>>>> I know it is not recommended to assigning mapper locations by myself. >>>>>> But There needs to be one mapper running in each node in some cases, >>>>>> so I need a strict way to do it. >>>>>> >>>>>> So, locations is taken care of by JobTracker(scheduler), but it is not >>>>>> strict. >>>>>> And, the only way to do it strictly is making a own scheduler, right >>>>>> ? >>>>>> >>>>>> I have checked the source and I am not sure where to modify to do it. >>>>>> What I understand is FairScheduler and others are for scheduling >>>>>> multiple jobs. Is this right ? >>>>>> What I want to do is scheduling tasks in one job. >>>>>> This can be achieved by FairScheduler and others ? >>>>>> >>>>>> Regards, >>>>>> Hiroyuki >>>>>> >>>>>> On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel >>>>>> <[EMAIL PROTECTED]> wrote: >>>>>>> Mappers? Uhm... yes you can do it. >>>>>>> Yes it is non-trivial. >>>>>>> Yes, it is not recommended. >>>>>>> >>>>>>> I think we talk a bit about this in an InfoQ article written by >>>>>>> Boris >>>>>>> Lublinsky. >>>>>>> >>>>>>> Its kind of wild when your entire cluster map goes red in ganglia... >>>>>>> :-) >>>>>> |