Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> YARN: LocalResources and file distribution


Copy link to this message
-
Re: YARN: LocalResources and file distribution
Robert,

 YARN, by default, will only download *resource* from a shared namespace (e.g. HDFS).

 If /home/hadoop/robert/large_jar.jar is available on each node then you can specify path as file:///home/hadoop/robert/large_jar.jar and it should work.

 Else, you'll need to copy /home/hadoop/robert/large_jar.jar to HDFS and then specify hdfs://host:port/path/to/large_jar.jar.

hth,
Arun

On Dec 1, 2013, at 12:03 PM, Robert Metzger <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I'm currently writing code to run my application using Yarn (Hadoop 2.2.0).
> I used this code as a skeleton: https://github.com/hortonworks/simple-yarn-app
>
> Everything works fine on my local machine or on a cluster with the shared directories, but when I want to access resources outside of commonly accessible locations, my application fails.
>
> I have my application in a large jar file, containing everything (Submission Client, Application Master, and Workers).
> The submission client registers the large jar file as a local resource for the Application master's context.
>
> In my understanding, Yarn takes care of transferring the client-local resources to the application master's container.
> This is also stated here: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html
>
> You can use the LocalResource to add resources to your application request. This will cause YARN to distribute the resource to the ApplicationMaster node.
>
> If I'm starting my jar from the dir "/home/hadoop/robert/large_jar.jar", I'll get the following error from the nodemanager (another node in the cluster):
>
> 2013-12-01 20:13:00,810 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { file:/home/hadoop/robert/large_jar.jar, ..
>
> So it seems as this node tries to access the file from its local file system.
>
> Do I have to use another "protocol" for the file, something like "file://host:port/home/blabla" ?
>
> Is it true that Yarn is able to distribute files (not using hdfs obviously?) ?
>
>
> The distributedshell-example suggests that I have to use HDFS: https://github.com/apache/hadoop-common/blob/50f0de14e377091c308c3a74ed089a7e4a7f0bfe/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
>
>
> Sincerely,
> Robert
>
>
>
>
>

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB