Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Reduce side question on MR


Copy link to this message
-
Reduce side question on MR
Hi,

I have one question related to the reduce phase of MR jobs.

The intermediate outputs of map tasks are pulled in from the nodes which
ran map tasks to the node where reducers is going to run and those
intermediate data is written to the reducers local fs. My question is that
if there is a job processing huge amount of data and it has multiple
mappers but only one reducer , then its possible that the job would never
complete successfully as the single hosts disk might not be sufficient to
hold all the map outputs of the job.

The job essentially would fail after retrying configured number of attempts.

Thanks,
Rahul