|
|
-
Re: Assigning reduce tasks to specific nodesJean-Marc Spaggiari 2012-12-08, 13:18
Hi Tsuyoshi,
For which version of Hadoop is that? I think it's for 0.2x.x, right? Because I'm not able to find this class in 1.0.x Thanks, JM 2012/12/8, Tsuyoshi OZAWA <[EMAIL PROTECTED]>: > Hi Hioryuki, > > Lately I've changed scheduler for improving hadoop, so I may help you. > > RMContainerAllocator#handleEvent decides MapTasks to allocated containers. > You can implement semi-strict(best effort allocation) mode by hacking > there. Note that, however, allocation of containers is done > by ResourceManager. MRAppMaster can not control where to allocate > containers, but where to allocate MapTasks. > > If you have any question, please ask me. > > Thanks, > Tsuyoshi > > > On Sat, Dec 8, 2012 at 4:51 AM, Jean-Marc Spaggiari > <[EMAIL PROTECTED] >> wrote: > >> Hi Hiroyuki, >> >> Have you made any progress on that? >> >> I'm also looking at a way to assign specific Map tasks to specific >> nodes (I want the Map to run where the data is). >> >> JM >> >> 2012/12/1, Michael Segel <[EMAIL PROTECTED]>: >> > I haven't thought about reducers but in terms of mappers you need to >> > override the data locality so that it thinks that the node where you >> want to >> > send the data exists. >> > Again, not really recommended since it will kill performance unless the >> > compute time is at least an order of magnitude greater than the time it >> > takes to transfer the data. >> > >> > Really, really don't recommend it.... >> > >> > We did it as a hack, just to see if we could do it and get better >> > overall >> > performance for a specific job. >> > >> > >> > On Dec 1, 2012, at 6:27 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> > >> >> Yes, scheduling is done on a Tasktracker heartbeat basis, so it is >> >> certainly possible to do absolutely strict scheduling (although be >> >> aware of the condition of failing/unavailable tasktrackers). >> >> >> >> Mohit's suggestion is somewhat like what you desire (delay scheduling >> >> in fair scheduler config) - but setting it to very high values is bad >> >> to do (for jobs that don't need this). >> >> >> >> On Sat, Dec 1, 2012 at 4:11 PM, Hiroyuki Yamada <[EMAIL PROTECTED]> >> >> wrote: >> >>> Thank you all for the comments. >> >>> >> >>>> you ought to make sure your scheduler also does non-strict >> >>>> scheduling >> of >> >>>> data local tasks for jobs >> >>> that don't require such strictness >> >>> >> >>> I just want to make sure one thing. >> >>> If I write my own scheduler, is it possible to do "strict" scheduling >> >>> ? >> >>> >> >>> Thanks >> >>> >> >>> On Thu, Nov 29, 2012 at 1:56 PM, Mohit Anchlia >> >>> <[EMAIL PROTECTED] >> > >> >>> wrote: >> >>>> Look at locality delay parameter >> >>>> >> >>>> Sent from my iPhone >> >>>> >> >>>> On Nov 28, 2012, at 8:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >>>> >> >>>>> None of the current schedulers are "strict" in the sense of "do not >> >>>>> schedule the task if such a tasktracker is not available". That has >> >>>>> never been a requirement for Map/Reduce programs and nor should be. >> >>>>> >> >>>>> I feel if you want some code to run individually on all nodes for >> >>>>> whatever reason, you may as well ssh into each one and start it >> >>>>> manually with appropriate host-based parameters, etc.. and then >> >>>>> aggregate their results. >> >>>>> >> >>>>> Note that even if you get down to writing a scheduler for this >> >>>>> (which >> >>>>> I don't think is a good idea, but anyway), you ought to make sure >> your >> >>>>> scheduler also does non-strict scheduling of data local tasks for >> jobs >> >>>>> that don't require such strictness - in order for them to complete >> >>>>> quickly than wait around for scheduling in a fixed manner. >> >>>>> >> >>>>> On Thu, Nov 29, 2012 at 6:00 AM, Hiroyuki Yamada >> >>>>> <[EMAIL PROTECTED] >> > >> >>>>> wrote: >> >>>>>> Thank you all for the comments and advices. >> >>>>>> >> >>>>>> I know it is not recommended to assigning mapper locations by >> myself. >> >>>>>> But There needs to be one mapper running in each node in some |