I have one question related to the reduce phase of MR jobs.
The intermediate outputs of map tasks are pulled in from the nodes which
ran map tasks to the node where reducers is going to run and those
intermediate data is written to the reducers local fs. My question is that
if there is a job processing huge amount of data and it has multiple
mappers but only one reducer , then its possible that the job would never
complete successfully as the single hosts disk might not be sufficient to
hold all the map outputs of the job.
The job essentially would fail after retrying configured number of attempts.