Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - When does Reduce job start


+
sagar naik 2011-01-04, 18:53
+
Allen Wittenauer 2011-01-04, 20:16
Copy link to this message
-
Re: When does Reduce job start
Jeff Bean 2011-01-04, 23:14
It's part of the design that reduce() does not get called until the map
phase is complete. You're seeing reduce report as started when map is at 90%
complete because hadoop is shuffling data from the mappers that have
completed. As currently designed, you can't prematurely start reduce()
because there is no way to gaurantee you have all the values for any key
until all the mappers are done. reduce() requires a key and all the values
for that key in order to execute.

Jeff
On Tue, Jan 4, 2011 at 10:53 AM, sagar naik <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> number  of map task: 1000s
> number of reduce task:single digit
>
> In such cases the reduce task wont  started even when few map task are
> completed.
> Example:
> In my observation of a sample run of bin/hadoop jar
> hadoop-*examples*.jar pi 10000 10, the reduce did not start untill 90%
> of map task were complete.
>
> The only reason, I can think of not starting  a reduce task is to
> avoid the un-necessary transfer of map output data in case of
> failures.
>
>
> Is there a way to quickly start the reduce task in such case ?
> Wht is the configuration param to change this behavior
>
>
>
> Thanks,
> Sagar
>
+
sagar naik 2011-01-05, 01:14
+
James Seigel 2011-01-05, 01:18
+
Harsh J 2011-01-05, 03:23
+
sagar naik 2011-01-05, 06:40