Jay Vyas 2012-10-13, 02:16
Think of it in partition terms. If you know that your map-splits X, Y
and Z won't emit any key of partition P, then the Pth reducer can jump
ahead and run without those X, Y and Z completing their processing.
Otherwise, a reducer can't run until all maps have completed, in fear
of losing a few keys that may have come out of the maps it has skipped
fetching from. To some this may be tolerable, or some would be OK to
receive it later - but thats gonna add complexity when you could just
fetch continuously and wait.
Should be easy to take the MRv2 application  and add such a thing
in today, if you need it.
 - Given the confusion between what MRv2 and YARN mean individually
(they get mixed up too much), hope this blog post of mine helps:
On Sat, Oct 13, 2012 at 7:46 AM, Jay Vyas <[EMAIL PROTECTED]> wrote:
> Is it possible for reducers to start (not just copying, but actually)
> "reducing" before all mappers are done, speculatively?
> In particular im asking this because Im curious about the internals of how
> the shuffle and sort might (or might not :)) be able to support this.