Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: 100K Maps scenario


Copy link to this message
-
Re: 100K Maps scenario


Just a follow up to see if anyone can shed some light on this:
My understanding is that each block after getting replicated 3 times, a map task is run on each of the replica in parallel.
The thing i am trying to double verify is in a scenario where a file is split into 10K or 100K or more blocks it will result in atleast 300K Map tasks being performed and this looks like an overkill from a performance or just a logical perspective. 
Will appreciate any thoughts on this.
Thanks
Sai

________________________________
 From: Sai Sai <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Sai Sai <[EMAIL PROTECTED]>
Sent: Friday, 12 April 2013 1:37 PM
Subject: Re: Does a Map task run 3 times on 3 TTs or just once
 
Just wondering if it is right to assume that a Map task is run 3 times on 3 different TTs in parallel and whoever completes processing the task first that output is picked up and written to intermediate location.
Or is it true that a map task even though its data is replicated 3 times will run only once and other 2 will be on the stand by just incase this fails the second one will run followed by 3rd one if the 2nd Mapper fails.
Plesae pour some light.
Thanks
Sai
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB