Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> NM/AM interaction


+
John Lilley 2013-05-28, 22:43
Copy link to this message
-
Re: NM/AM interaction
All AuxiliaryServices are configured in yarn-site.xml with a service
ID and a class is associated to the defined service ID. For MR2, one
generally adds the two below properties, first defining the name
(Service ID), second defining the class to launch for the defined
name:

<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>

As part of the interface, all AuxiliaryServices may submit back some
metadata that they would require clients to be aware of. For the
ShuffleHandler, the port is rather important, so it serializes it via
the getMeta() interface. [1]

As part of any startContainer(…) response from the NodeManager's
ContainerManager service, all metadata of all available auxiliary
services are shipped back as part of a successful response to a
container start request. This is a mapping based on (configured
service ID -> metadata) for every Aux service configured and currently
running. [2]

A client, such as MR2, receives this batch of metadata and
deserializes whatever it is looking for [3] using the service ID name
string it is aware of [4].

[1] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java#L195
and https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L328
[2] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L502
[3] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java#L165
[4] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L148

On Wed, May 29, 2013 at 4:13 AM, John Lilley <[EMAIL PROTECTED]> wrote:
> I was reading from the HortonWorks blog:
>
> “How MapReduce shuffle takes advantage of NM’s Auxiliary-services
>
> The Shuffle functionality required to run a MapReduce (MR) application is
> implemented as an Auxiliary Service. This service starts up a Netty Web
> Server, and knows how to handle MR specific shuffle requests from Reduce
> tasks. The MR AM specifies the service id for the shuffle service, along
> with security tokens that may be required. The NM provides the AM with the
> port on which the shuffle service is running which is passed onto the Reduce
> tasks.”
>
> How does the AM get the service ID and the port?
>
> Thanks,
>
> John
>
>

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB