Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Queries on next gen MR architecture


+
Praveen Sripati 2012-01-05, 16:29
+
Praveen Sripati 2012-01-07, 02:23
+
Arun C Murthy 2012-01-07, 18:38
Copy link to this message
-
Re: Queries on next gen MR architecture
Thanks for the response.

I was just thinking why some of the design decisions were made with MRv2.

> No, the OR condition is implied by the hierarchy of requests (node, rack,
*).

If InputSplit1 is on Node11 and Node12 and InputSplit2 on Node21 and
Node22. Then the AM can ask for 1 containers on each of the nodes and * as
2 for map tasks. Then the RM can return  2 nodes on Node11 and make * as 0.
The data locality is lost for InputSplit2 or else the AM has to make
another call to RM releasing one of the container and asking for another
container. A bit more complex request specifying the dependencies might be
more effective.

> NM doesn't make any 'out' calls to anyone by RM, else it would be severe
scalability bottleneck.

There is already a one-way communication between the AM and NM for
launching the containers. The response can from the NM can hold the list of
completed containers from the previous call.

> All interactions (RPCs) are authenticated. Also, there is a container
token provided by the RM (during allocation) which is verified by the NM
during container launch.

So, a shared key has to be deployed manually on all the nodes for the NM?

Regards,
Praveen

On Sun, Jan 8, 2012 at 12:08 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:

>
> On Jan 5, 2012, at 8:29 AM, Praveen Sripati wrote:
>
> Hi,
>
> I had been going through the MRv2 documentation and have the following
> queries
>
> 1) Let's say that an InputSplit is on Node1 and Node2.
>
> Can the ApplicationMaster ask the ResourceManager for a container either
> on Node1 or Node2 with an OR condition?
>
>
> No, the OR condition is implied by the hierarchy of requests (node, rack,
> *).
>
> In this case, assuming topology is node1/rack1 node2/rack1, requests would
> be:
> node1 -> 1
> node2 -> 1
> rack1 -> 1
> * -> 1
>
> OTOH, if the topology is node1/rack1, node2/rack2, requests would be:
> node1 -> 1
> node2 -> 1
> rack1 -> 1
> rack2 -> 1
> * -> 1
>
> In both cases, * would limit the #allocated-containers to '1'.
>
> In the first case rack1 itself (independent of *) would limit
> #allocated-containers to 1.
>
> More details here:
> http://developer.yahoo.com/blogs/hadoop/posts/2011/03/mapreduce-nextgen-scheduler/
> .
>
> I'll work on incorporating this into our docs on hadoop.apache.org.
>
> 2) > The Scheduler receives periodic information about the resource usages
> on allocated resources from the NodeManagers. The Scheduler also makes
> available status of completed Containers to the appropriate
> ApplicationMaster.
>
> What's the use of NM sending the resource usages to the scheduler?
>
> Why can't the NM directly talk to the AM about the completed containers?
> Does any information pass from NM to AM?
>
>
> The NM sends resource usages to the scheduler to allow it to track
> resource utilization on each node and, in future, make smarter decisions
> about allocating extra containers on under-utilized nodes etc.
>
> NM doesn't make any 'out' calls to anyone by RM, else it would be severe
> scalability bottleneck.
>
> 3) >The Map-Reduce ApplicationMaster has the following components:
> > TaskUmbilical – The component responsible for receiving heartbeats and
> status updates form the map and reduce tasks.
>
> Does the communication happen directly between the container and the AM?If yes, the task completion status could also be sent from the container to
> the AM.
>
>
> Yes, it already is sent to AM.
>
> 4) > The Hadoop Map-Reduce JobClient polls the ASM to obtain information
> about the MR AM and then directly talks to the AM for status, counters etc.
>
> Once the Job is completed the AM goes down, what happens to the Counters?
> What is the flow of the Counter (Container -> NM -> AM)?
>
>
> Once jobs completes the Counters etc. are stored in JobHistory file (one
> per job) which is served up, if necessary, by the JobHistoryServer.
>
> 5) If a new YARN application is created. How can the NM trust the request
> from AM?
>
>
> All interactions (RPCs) are authenticated. Also, there is a container
+
Arun C Murthy 2012-01-08, 07:25
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB