Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> knowing the nodes on which reduce tasks will run


Copy link to this message
-
Re: knowing the nodes on which reduce tasks will run
Hi,

The reducer is run where there is slot available, the location is not
related to where the data is located and it is not possible to choose where
the reducer will run (except by tweaking the tasktracker...).

Regards

Bertrand

On Mon, Sep 3, 2012 at 4:19 PM, Abhay Ratnaparkhi <
[EMAIL PROTECTED]> wrote:

> Hello,
>
> How can one get to know the nodes on which reduce tasks will run?
>
> One of my job is running and it's completing all the map tasks.
> My map tasks write lots of intermediate data. The intermediate directory
> is getting full on all the nodes.
> If the reduce task take any node from cluster then It'll try to copy the
> data to same disk and it'll eventually fail due to Disk space related
> exceptions.
>
> I have added few more tasktracker nodes in the cluster and now want to run
> reducer on new nodes only.
> Is it possible to choose a node on which the reducer will run? What's the
> algorithm hadoop uses to get a new node to run reducer?
>
> Thanks in advance.
>
> Bye
> Abhay
>

--
Bertrand Dechoux
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB