Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Re: knowing the nodes on which reduce tasks will run


+
Abhay Ratnaparkhi 2012-09-03, 15:36
Copy link to this message
-
Re: knowing the nodes on which reduce tasks will run
Bejoy Ks 2012-09-03, 15:46
Hi Abhay

You need this value to be changed before you submit your job and restart
TT. Modifying this value in  mid time won't affect the running jobs.

On Mon, Sep 3, 2012 at 9:06 PM, Abhay Ratnaparkhi <
[EMAIL PROTECTED]> wrote:

> How can I set  'mapred.tasktracker.reduce.tasks.maximum'  to "0" in a
> running tasktracker?
> Seems that I need to restart the tasktracker and in that case I'll loose
> the output of map tasks by particular tasktracker.
>
> Can I change   'mapred.tasktracker.reduce.tasks.maximum'  to "0"  without
> restarting tasktracker?
>
> ~Abhay
>
>
> On Mon, Sep 3, 2012 at 8:53 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
>
>> HI Abhay
>>
>> The TaskTrackers on which the reduce tasks are triggered is chosen in
>> random based on the reduce slot availability. So if you don't need the
>> reduce tasks to be scheduled on some particular nodes you need to set
>> 'mapred.tasktracker.reduce.tasks.maximum' on those nodes to 0. The
>> bottleneck here is that this property is not a job level one you need to
>> set it on a cluster level.
>>
>> A cleaner approach will be to configure each of your nodes with the right
>> number of map and reduce slots based on the resources available on each
>> machine.
>>
>>
>> On Mon, Sep 3, 2012 at 7:49 PM, Abhay Ratnaparkhi <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hello,
>>>
>>> How can one get to know the nodes on which reduce tasks will run?
>>>
>>> One of my job is running and it's completing all the map tasks.
>>> My map tasks write lots of intermediate data. The intermediate directory
>>> is getting full on all the nodes.
>>> If the reduce task take any node from cluster then It'll try to copy the
>>> data to same disk and it'll eventually fail due to Disk space related
>>> exceptions.
>>>
>>> I have added few more tasktracker nodes in the cluster and now want to
>>> run reducer on new nodes only.
>>> Is it possible to choose a node on which the reducer will run? What's
>>> the algorithm hadoop uses to get a new node to run reducer?
>>>
>>> Thanks in advance.
>>>
>>> Bye
>>> Abhay
>>>
>>
>>
>