I realize that the intended purpose of speculative execution is to overcome individual slow tasks...and I have read that it explicitly is *not* intended to start copies of a task simultaneously and to then race them, but rather to start copies of tasks that "seem slow" after running for a while.
...but aside from merely being slow, sometimes tasks arbitrarily fail, and not in data-driven or otherwise deterministic ways. A task may fail and then succeed on a subsequent attempt...but the total job time is extended by the time wasted during the initial failed task attempt.
It would super-swell to run copies of a task simultaneously from the starting line and simply kill the copies after the winner finishes. While is is "wasteful" in some sense (that is the argument offered for not running speculative execution this way to begin with), it would more precise to say that different users may have different priorities under various use-case scenarios. The "wasting" of duplicate tasks on extra cores may be an acceptable cost toward the higher priority of minimizing job times for a given application.
Is there any notion of this in Hadoop?
Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
"Luminous beings are we, not this crude matter."