Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Questions with regard to scheduling of map and reduce tasks


Copy link to this message
-
Re: Questions with regard to scheduling of map and reduce tasks

You don't need to touch the code related protocol-buffer records at all as there are java-native interfaces for everything, for e.g. - org.apache.hadoop.yarn.api.AMRMProtocol.

Regarding your question - The JobClient first obtains the locations of DFS blocks via the InputFormat.getSplits() and uploads the accumulated information into a split file, see Job.submitInternal() ->  JobSubmitter.writeSplits() ->  ...

The MR AM then downloads and reads the split file and reconstructs the splits-information and creates TaskAttempts(TAs) which then use it request containers. See MRAppMaster code: JobImpl.InitTransition for how TAs are created with host information.

HTH,

+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Aug 31, 2012, at 4:17 AM, Vasco Visser wrote:

> Thanks again for the reply, it is becoming clear.
>
> While on the subject of going over the code, do you know by any chance
> where the piece of code is that creates resource requests according to
> locations of HDFS blocks? I am looking for that, but the protocol
> buffer stuff makes it difficult for me to understand what is going on.
>
> regards, Vasco
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB