Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Re: Assigning reduce tasks to specific nodes


Copy link to this message
-
Re: Assigning reduce tasks to specific nodes
Jean-Marc Spaggiari 2012-12-08, 13:18
Hi Tsuyoshi,

For which version of Hadoop is that? I think it's for 0.2x.x, right?
Because I'm not able to find this class in 1.0.x

Thanks,

JM

2012/12/8, Tsuyoshi OZAWA <[EMAIL PROTECTED]>:
> Hi Hioryuki,
>
> Lately I've changed scheduler for improving hadoop, so I may help you.
>
> RMContainerAllocator#handleEvent decides MapTasks to allocated containers.
>   You can implement semi-strict(best effort allocation) mode by hacking
> there. Note that, however, allocation of containers is done
> by ResourceManager. MRAppMaster can not control where to allocate
> containers, but where to allocate MapTasks.
>
> If you have any question, please ask me.
>
> Thanks,
> Tsuyoshi
>
>
> On Sat, Dec 8, 2012 at 4:51 AM, Jean-Marc Spaggiari
> <[EMAIL PROTECTED]
>> wrote:
>
>> Hi Hiroyuki,
>>
>> Have you made any progress on that?
>>
>> I'm also looking at a way to assign specific Map tasks to specific
>> nodes (I want the Map to run where the data is).
>>
>> JM
>>
>> 2012/12/1, Michael Segel <[EMAIL PROTECTED]>:
>> > I haven't thought about reducers but in terms of mappers you need to
>> > override the data locality so that it thinks that the node where you
>> want to
>> > send the data exists.
>> > Again, not really recommended since it will kill performance unless the
>> > compute time is at least an order of magnitude greater than the time it
>> > takes to transfer the data.
>> >
>> > Really, really don't recommend it....
>> >
>> > We did it as a hack, just to see if we could do it and get better
>> > overall
>> > performance for a specific job.
>> >
>> >
>> > On Dec 1, 2012, at 6:27 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> >
>> >> Yes, scheduling is done on a Tasktracker heartbeat basis, so it is
>> >> certainly possible to do absolutely strict scheduling (although be
>> >> aware of the condition of failing/unavailable tasktrackers).
>> >>
>> >> Mohit's suggestion is somewhat like what you desire (delay scheduling
>> >> in fair scheduler config) - but setting it to very high values is bad
>> >> to do (for jobs that don't need this).
>> >>
>> >> On Sat, Dec 1, 2012 at 4:11 PM, Hiroyuki Yamada <[EMAIL PROTECTED]>
>> >> wrote:
>> >>> Thank you all for the comments.
>> >>>
>> >>>> you ought to make sure your scheduler also does non-strict
>> >>>> scheduling
>> of
>> >>>> data local tasks for jobs
>> >>> that don't require such strictness
>> >>>
>> >>> I just want to make sure one thing.
>> >>> If I write my own scheduler, is it possible to do "strict" scheduling
>> >>> ?
>> >>>
>> >>> Thanks
>> >>>
>> >>> On Thu, Nov 29, 2012 at 1:56 PM, Mohit Anchlia
>> >>> <[EMAIL PROTECTED]
>> >
>> >>> wrote:
>> >>>> Look at locality delay parameter
>> >>>>
>> >>>> Sent from my iPhone
>> >>>>
>> >>>> On Nov 28, 2012, at 8:44 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>> >>>>
>> >>>>> None of the current schedulers are "strict" in the sense of "do not
>> >>>>> schedule the task if such a tasktracker is not available". That has
>> >>>>> never been a requirement for Map/Reduce programs and nor should be.
>> >>>>>
>> >>>>> I feel if you want some code to run individually on all nodes for
>> >>>>> whatever reason, you may as well ssh into each one and start it
>> >>>>> manually with appropriate host-based parameters, etc.. and then
>> >>>>> aggregate their results.
>> >>>>>
>> >>>>> Note that even if you get down to writing a scheduler for this
>> >>>>> (which
>> >>>>> I don't think is a good idea, but anyway), you ought to make sure
>> your
>> >>>>> scheduler also does non-strict scheduling of data local tasks for
>> jobs
>> >>>>> that don't require such strictness - in order for them to complete
>> >>>>> quickly than wait around for scheduling in a fixed manner.
>> >>>>>
>> >>>>> On Thu, Nov 29, 2012 at 6:00 AM, Hiroyuki Yamada
>> >>>>> <[EMAIL PROTECTED]
>> >
>> >>>>> wrote:
>> >>>>>> Thank you all for the comments and advices.
>> >>>>>>
>> >>>>>> I know it is not recommended to assigning mapper locations by
>> myself.
>> >>>>>> But There needs to be one mapper running in each node in some