Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> When does Reduce job start

Copy link to this message
Re: When does Reduce job start
It's part of the design that reduce() does not get called until the map
phase is complete. You're seeing reduce report as started when map is at 90%
complete because hadoop is shuffling data from the mappers that have
completed. As currently designed, you can't prematurely start reduce()
because there is no way to gaurantee you have all the values for any key
until all the mappers are done. reduce() requires a key and all the values
for that key in order to execute.

On Tue, Jan 4, 2011 at 10:53 AM, sagar naik <[EMAIL PROTECTED]> wrote:

> Hi All,
> number  of map task: 1000s
> number of reduce task:single digit
> In such cases the reduce task wont  started even when few map task are
> completed.
> Example:
> In my observation of a sample run of bin/hadoop jar
> hadoop-*examples*.jar pi 10000 10, the reduce did not start untill 90%
> of map task were complete.
> The only reason, I can think of not starting  a reduce task is to
> avoid the un-necessary transfer of map output data in case of
> failures.
> Is there a way to quickly start the reduce task in such case ?
> Wht is the configuration param to change this behavior
> Thanks,
> Sagar