Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Reduce side question on MR


+
Rahul Bhattacharjee 2013-05-29, 14:40
Copy link to this message
-
Re: Reduce side question on MR
I don't see a direct question asked, but here's a condition in the
source code you want to take a look at (*):
https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobInProgress.java#L2316

(*) - Yet to appear in MRv2 - See/help out with MAPREDUCE-2723.

On Wed, May 29, 2013 at 8:10 PM, Rahul Bhattacharjee
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have one question related to the reduce phase of MR jobs.
>
> The intermediate outputs of map tasks are pulled in from the nodes which ran
> map tasks to the node where reducers is going to run and those intermediate
> data is written to the reducers local fs. My question is that if there is a
> job processing huge amount of data and it has multiple mappers but only one
> reducer , then its possible that the job would never complete successfully
> as the single hosts disk might not be sufficient to hold all the map outputs
> of the job.
>
> The job essentially would fail after retrying configured number of attempts.
>
> Thanks,
> Rahul

--
Harsh J
+
Rahul Bhattacharjee 2013-06-01, 07:33
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB