Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Queries on next gen MR architecture

Copy link to this message
Queries on next gen MR architecture

I had been going through the MRv2 documentation and have the following

1) Let's say that an InputSplit is on Node1 and Node2.

Can the ApplicationMaster ask the ResourceManager for a container either on
Node1 or Node2 with an OR condition?

2) > The Scheduler receives periodic information about the resource usages
on allocated resources from the NodeManagers. The Scheduler also makes
available status of completed Containers to the appropriate

What's the use of NM sending the resource usages to the scheduler?

Why can't the NM directly talk to the AM about the completed containers?
Does any information pass from NM to AM?

3) >The Map-Reduce ApplicationMaster has the following components:
> TaskUmbilical – The component responsible for receiving heartbeats and
status updates form the map and reduce tasks.

Does the communication happen directly between the container and the AM? If
yes, the task completion status could also be sent from the container to
the AM.

4) > The Hadoop Map-Reduce JobClient polls the ASM to obtain information
about the MR AM and then directly talks to the AM for status, counters etc.

Once the Job is completed the AM goes down, what happens to the Counters?
What is the flow of the Counter (Container -> NM -> AM)?

5) If a new YARN application is created. How can the NM trust the request
from AM?

6) > MapReduce NextGen uses wire-compatible protocols to allow different
versions of servers and clients to communicate with each other.

What is meant by the `wire-compatible protocols` and how is it implemented?

7) > The computation framework (ResourceManager and NodeManager) is
completely generic and is free of MapReduce specificities.

Is this the reason for adding auxiliary services for shuffling to the NM?