Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Reduce side question on MR


Copy link to this message
-
Re: Reduce side question on MR
Thanks Harsh for the response. It very much answers what I was looking for.

Regards,
Rahul
On Wed, May 29, 2013 at 8:10 PM, Rahul Bhattacharjee <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> I have one question related to the reduce phase of MR jobs.
>
> The intermediate outputs of map tasks are pulled in from the nodes which
> ran map tasks to the node where reducers is going to run and those
> intermediate data is written to the reducers local fs. My question is that
> if there is a job processing huge amount of data and it has multiple
> mappers but only one reducer , then its possible that the job would never
> complete successfully as the single hosts disk might not be sufficient to
> hold all the map outputs of the job.
>
> The job essentially would fail after retrying configured number of
> attempts.
>
> Thanks,
> Rahul
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB