Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: knowing the nodes on which reduce tasks will run

Copy link to this message
Re: knowing the nodes on which reduce tasks will run
On 3 September 2012 15:19, Abhay Ratnaparkhi <[EMAIL PROTECTED]>wrote:

> Hello,
> How can one get to know the nodes on which reduce tasks will run?
> One of my job is running and it's completing all the map tasks.
> My map tasks write lots of intermediate data. The intermediate directory
> is getting full on all the nodes.
> If the reduce task take any node from cluster then It'll try to copy the
> data to same disk and it'll eventually fail due to Disk space related
> exceptions.
you could always set up specific partitions for intermediate data, though
you get better bandwidth by striping the data across all disks, and better
flexibility by sharing the same partition.

There's also a property to set the amount of space to allocate for DFS
storage; reduce that by changing  dfs.datanode.du.reserved and the
datanodes will leave more free space around.

see: http://wiki.apache.org/hadoop/DiskSetup
Michael Segel 2012-09-03, 15:59
Hemanth Yamijala 2012-09-03, 15:56
Abhay Ratnaparkhi 2012-09-03, 16:00