Just a follow up to see if anyone can shed some light on this:
My understanding is that each block after getting replicated 3 times, a map task is run on each of the replica in parallel.
The thing i am trying to double verify is in a scenario where a file is split into 10K or 100K or more blocks it will result in atleast 300K Map tasks being performed and this looks like an overkill from a performance or just a logical perspective.
Will appreciate any thoughts on this.
From: Sai Sai <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Sai Sai <[EMAIL PROTECTED]>
Sent: Friday, 12 April 2013 1:37 PM
Subject: Re: Does a Map task run 3 times on 3 TTs or just once
Just wondering if it is right to assume that a Map task is run 3 times on 3 different TTs in parallel and whoever completes processing the task first that output is picked up and written to intermediate location.
Or is it true that a map task even though its data is replicated 3 times will run only once and other 2 will be on the stand by just incase this fails the second one will run followed by 3rd one if the 2nd Mapper fails.
Plesae pour some light.