Am 19.11.2012 um 16:14 schrieb "Kartashov, Andy" <[EMAIL PROTECTED]>:
> Does MapReduce run tasks of redundant blocks ?
> Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1 (replica #2) – DN3
> Will MR attempt:
> a. to start 3 Map tasks (one per replicated block) end execute them all
> b. to start 3 Map tasks (one per replicated block) end drop the other two as soon as one of the three executed successfully
> c. will start only 1 Map task (for just one block avoiding all replicated ones) and will attempt to start (another one of the replicated blocks) when and only when the initially task running (say on DN1)failed
the JobTracker will schedule the map task on one node only initially. There's no need to launch the task on all nodes that have a local copy of the block.
If a task fails during its execution (node failure, e.g.), the JobTracker will launch the task again on another node with that block.
There's another advanced feature called Speculative Execution. If a task is progressing slowly through a phase (maybe due to flaky hardware), the JobTracker will launch the task in parallel on another node. The node finishing first will be used to get the task's output. The slow task will be killed.