-Queries on next gen MR architecture
Praveen Sripati 2012-01-05, 16:29
I had been going through the MRv2 documentation and have the following
1) Let's say that an InputSplit is on Node1 and Node2.
Can the ApplicationMaster ask the ResourceManager for a container either on
Node1 or Node2 with an OR condition?
2) > The Scheduler receives periodic information about the resource usages
on allocated resources from the NodeManagers. The Scheduler also makes
available status of completed Containers to the appropriate
What's the use of NM sending the resource usages to the scheduler?
Why can't the NM directly talk to the AM about the completed containers?
Does any information pass from NM to AM?
3) >The Map-Reduce ApplicationMaster has the following components:
> TaskUmbilical – The component responsible for receiving heartbeats and
status updates form the map and reduce tasks.
Does the communication happen directly between the container and the AM? If
yes, the task completion status could also be sent from the container to
4) > The Hadoop Map-Reduce JobClient polls the ASM to obtain information
about the MR AM and then directly talks to the AM for status, counters etc.
Once the Job is completed the AM goes down, what happens to the Counters?
What is the flow of the Counter (Container -> NM -> AM)?
5) If a new YARN application is created. How can the NM trust the request
6) > MapReduce NextGen uses wire-compatible protocols to allow different
versions of servers and clients to communicate with each other.
What is meant by the `wire-compatible protocols` and how is it implemented?
7) > The computation framework (ResourceManager and NodeManager) is
completely generic and is free of MapReduce specificities.
Is this the reason for adding auxiliary services for shuffling to the NM?