Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: 100K Maps scenario

Sai Sai 2013-04-13, 01:45
Copy link to this message
Re: 100K Maps scenario
No, only one copy of each block will be processed.

If a task fails, it will be retried on another copy. Also, if speculative execution is enabled, slow tasks might be executed twice in parallel. But this will only happen rarely.

Am 12.04.2013 um 18:45 schrieb Sai Sai <[EMAIL PROTECTED]>:

> Just a follow up to see if anyone can shed some light on this:
> My understanding is that each block after getting replicated 3 times, a map task is run on each of the replica in parallel.
> The thing i am trying to double verify is in a scenario where a file is split into 10K or 100K or more blocks it will result in atleast 300K Map tasks being performed and this looks like an overkill from a performance or just a logical perspective.
> Will appreciate any thoughts on this.
> Thanks
> Sai
> From: Sai Sai <[EMAIL PROTECTED]>
> Sent: Friday, 12 April 2013 1:37 PM
> Subject: Re: Does a Map task run 3 times on 3 TTs or just once
> Just wondering if it is right to assume that a Map task is run 3 times on 3 different TTs in parallel and whoever completes processing the task first that output is picked up and written to intermediate location.
> Or is it true that a map task even though its data is replicated 3 times will run only once and other 2 will be on the stand by just incase this fails the second one will run followed by 3rd one if the 2nd Mapper fails.
> Plesae pour some light.
> Thanks
> Sai

Kai Voigt
Sai Sai 2013-04-13, 01:59