-Re: communication path for task assignment in hadoop 2.X
Harsh J 2013-04-11, 12:56
My response inline.
On Thu, Apr 11, 2013 at 11:51 AM, hari <[EMAIL PROTECTED]> wrote:
> Hi list,
> I was looking at the node communication (in hadoop-2.0.3-alpha)
> responsible for task assignment. So
> far I saw that the resourcemanager and nodemanager
> exchange heartbeat request/response.
Very glad you're checking out YARN. Do provide feedback on what we can
improve over the YARN JIRA when you get to try it!
> I have a couple of questions on this:
> 1. I saw that the heartbeat response to nodemanager does
> not include new task assignment. I think in the previous
> versions (eg. 0.20.2) the task assignment was piggybacked in the
> heartbeat response to the tasktracker. In the v2.0.3, so far
> I could only see that the response can trigger nodemanager
> shutdown, reboot, app cleanup, and container cleanup. Are there
> any other actions being triggered on the nodemanager
> by the heartbeat response ?
Note: New term instead of 'Task', in YARN, is 'Container'.
Yes, the heartbeats between the NodeManager and the ResourceManager
does not account for container assignments anymore. Container launches
are handled by a separate NodeManager-embedded service called the
You can read all the functions a heartbeat currently does at a
NodeManager under its service NodeStatusUpdater code at .
> 2. Is there a different communication path for task assignment ?
> Is the scheduler making the remote calls or are there other classes outside
> of yarn responsible for making the remote calls ?
The latter. Scheduler no longer is responsible for asking NodeManagers
to launch the containers. An ApplicationMaster just asks Scheduler to
reserve it some containers with specified resource (Memory/CPU/etc.)
allocations on available or preferred NodeManagers, and then once the
ApplicationMaster gets a response back that the allocation succeeded,
it manually communicates with ContainerManager on the allocated
NodeManager, to launch the command needed to run the 'task'.
A good-enough example to read would be the DistributedShell example.
I've linked  to show where the above AppMaster -> ContainerManager
requesting happens in its ApplicationMaster implementation, which
should help clear this for you.
 - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L392
[startContainer, stopContainer, getContainerStatus etc… are all
protocol-callable calls from an Application Master]
 - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java#L433
[Link is to the heartbeat retry loop, and then the response processing
follows right after the loop]
 - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L756
[See the whole method, i.e. above from this code line point which is
just the final RPC call, and you can notice how we build the
command/environment to launch our 'task']