Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Reduce side question on MR


Copy link to this message
-
Re: Reduce side question on MR
I don't see a direct question asked, but here's a condition in the
source code you want to take a look at (*):
https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobInProgress.java#L2316

(*) - Yet to appear in MRv2 - See/help out with MAPREDUCE-2723.

On Wed, May 29, 2013 at 8:10 PM, Rahul Bhattacharjee
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have one question related to the reduce phase of MR jobs.
>
> The intermediate outputs of map tasks are pulled in from the nodes which ran
> map tasks to the node where reducers is going to run and those intermediate
> data is written to the reducers local fs. My question is that if there is a
> job processing huge amount of data and it has multiple mappers but only one
> reducer , then its possible that the job would never complete successfully
> as the single hosts disk might not be sufficient to hold all the map outputs
> of the job.
>
> The job essentially would fail after retrying configured number of attempts.
>
> Thanks,
> Rahul

--
Harsh J