Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: 100K Maps scenario

Copy link to this message
Re: 100K Maps scenario

Just a follow up to see if anyone can shed some light on this:
My understanding is that each block after getting replicated 3 times, a map task is run on each of the replica in parallel.
The thing i am trying to double verify is in a scenario where a file is split into 10K or 100K or more blocks it will result in atleast 300K Map tasks being performed and this looks like an overkill from a performance or just a logical perspective. 
Will appreciate any thoughts on this.

 From: Sai Sai <[EMAIL PROTECTED]>
Sent: Friday, 12 April 2013 1:37 PM
Subject: Re: Does a Map task run 3 times on 3 TTs or just once
Just wondering if it is right to assume that a Map task is run 3 times on 3 different TTs in parallel and whoever completes processing the task first that output is picked up and written to intermediate location.
Or is it true that a map task even though its data is replicated 3 times will run only once and other 2 will be on the stand by just incase this fails the second one will run followed by 3rd one if the 2nd Mapper fails.
Plesae pour some light.