Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: In YARN, how does a task tracker knows the address of a job tracker?

Copy link to this message
RE: In YARN, how does a task tracker knows the address of a job tracker?

What you are doing sounds familiar.  We are in the process of implementing, not exactly MapReduce, but a system that has to do many of the things that MapReduce does (find data splits, define tasks, choose execution affinity, launch an app master, etc)

There is another special thing that MapReduce under YARN does that a normal YARN app cannot easily access, which are "auxiliary services".  MapReduce sets up a YARN auxiliary service to serve up the results of mapper outputs.  I think it is based on netty or jetty and HTTP.  The point is, that the MR aux service is part of the Hadoop distro, so all MR has to do is tell the NM to run it.  Regular YARN apps don't have this luxury without installing jars on each node and adding them to the hadoop stack's CLASSPATH.  There doesn't appear to be any standard or documented way to inject extra jars into the hadoop install.  As they say, that exercise is left to the reader.


From: ricky l [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 21, 2013 3:40 PM
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Hi John, thanks for your reply. I suspect there will be some external communication between AM and container tasks. I am trying to implement a Hadoop-like system to Yarn and I wanted to draw a high-level steps before starting the work. thanks,
On Thu, Nov 21, 2013 at 3:27 PM, John Lilley <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
MapReduce also communicates outside of what is directly supported by YARN.
In a YARN application, there is very little direct communication between the client and the AM, and between the AM and container tasks.
I think that an AM can update to the client two pieces of information -- "state" and "percent complete".
However, at launch time an AM can open up a protocol port and tell the client and the container tasks how to connect back.
I don't know the details, but I believe that the MapReduce AM communicates directly with all mapper, reducer tasks as well as the client.
From: ricky l [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Thursday, November 21, 2013 12:36 PM
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task tracker does not exist in the YARN MapReduce, doesn't it use the binary of job/task tracker? I though the application master runs job tracker binary and the containers in the node will run task tracker binary. thx

On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:

Starting with YARN there is no notion of job tracker and task tracker. Here is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all scheduling work)
2) Application state management :- managing and launching new map /reduce tasks (done by Application Master .. It is per job not one single entity in the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>. This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide 12) for how job actually gets executed.

Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Thu, Nov 21, 2013 at 7:52 AM, ricky l <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi all,

I have a question of how a task tracker identifies job tracker address when I submit MR job through YARN. As far as I know, both job tracker and task trackers are launched through application master and I am curious about the details about job and task tracker launch sequence.

NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.