Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> speculative execution before mappers finish


Copy link to this message
-
Re: speculative execution before mappers finish
Think of it in partition terms. If you know that your map-splits X, Y
and Z won't emit any key of partition P, then the Pth reducer can jump
ahead and run without those X, Y and Z completing their processing.

Otherwise, a reducer can't run until all maps have completed, in fear
of losing a few keys that may have come out of the maps it has skipped
fetching from. To some this may be tolerable, or some would be OK to
receive it later - but thats gonna add complexity when you could just
fetch continuously and wait.

Should be easy to take the MRv2 application [0] and add such a thing
in today, if you need it.

[0] - Given the confusion between what MRv2 and YARN mean individually
(they get mixed up too much), hope this blog post of mine helps:
http://www.cloudera.com/blog/2012/10/mr2-and-yarn-briefly-explained/

On Sat, Oct 13, 2012 at 7:46 AM, Jay Vyas <[EMAIL PROTECTED]> wrote:
> Is it possible for reducers to start (not just copying, but actually)
> "reducing" before all mappers are done, speculatively?
>
> In particular im asking this because Im curious about the internals of how
> the shuffle and sort might (or might not :)) be able to support this.

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB