-Re: Queries on next gen MR architecture
Praveen Sripati 2012-01-07, 02:23
Could someone please clarify on the below queries?
On Thu, Jan 5, 2012 at 9:59 PM, Praveen Sripati <[EMAIL PROTECTED]>wrote:
> I had been going through the MRv2 documentation and have the following
> 1) Let's say that an InputSplit is on Node1 and Node2.
> Can the ApplicationMaster ask the ResourceManager for a container either
> on Node1 or Node2 with an OR condition?
> 2) > The Scheduler receives periodic information about the resource usages
> on allocated resources from the NodeManagers. The Scheduler also makes
> available status of completed Containers to the appropriate
> What's the use of NM sending the resource usages to the scheduler?
> Why can't the NM directly talk to the AM about the completed containers?
> Does any information pass from NM to AM?
> 3) >The Map-Reduce ApplicationMaster has the following components:
> > TaskUmbilical – The component responsible for receiving heartbeats and
> status updates form the map and reduce tasks.
> Does the communication happen directly between the container and the AM?If yes, the task completion status could also be sent from the container to
> the AM.
> 4) > The Hadoop Map-Reduce JobClient polls the ASM to obtain information
> about the MR AM and then directly talks to the AM for status, counters etc.
> Once the Job is completed the AM goes down, what happens to the Counters?
> What is the flow of the Counter (Container -> NM -> AM)?
> 5) If a new YARN application is created. How can the NM trust the request
> from AM?
> 6) > MapReduce NextGen uses wire-compatible protocols to allow different
> versions of servers and clients to communicate with each other.
> What is meant by the `wire-compatible protocols` and how is it
> 7) > The computation framework (ResourceManager and NodeManager) is
> completely generic and is free of MapReduce specificities.
> Is this the reason for adding auxiliary services for shuffling to the NM?