Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: reducer tasks start time issue


Copy link to this message
-
Re: reducer tasks start time issue
A reduce can't process the complete data set until it has fetched all
partitions. And any map may produce a partition for any reducer.
Hence, we generally wait before all maps have terminated, and their
partition outputs ready and copied over to reduces, before we begin to
group and process the keys.

However, given that you began thinking about this, this paper on
"Online" Hadoop may interest you:
http://www.neilconway.org/docs/nsdi2010_hop.pdf

On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
> Hi guys,
>
> Supposing in a Hadoop job, there are both mappers and reducers. My question
> is, reducer tasks cannot begin until all mapper tasks complete? If so, why
> designed in this way?
>
> thanks in advance,
> Lin

--
Harsh J