A reduce can't process the complete data set until it has fetched all
partitions. And any map may produce a partition for any reducer.
Hence, we generally wait before all maps have terminated, and their
partition outputs ready and copied over to reduces, before we begin to
group and process the keys.
However, given that you began thinking about this, this paper on
"Online" Hadoop may interest you:
On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
> Hi guys,
> Supposing in a Hadoop job, there are both mappers and reducers. My question
> is, reducer tasks cannot begin until all mapper tasks complete? If so, why
> designed in this way?
> thanks in advance,