Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Symlinks for cacheArchives

Copy link to this message
Symlinks for cacheArchives

>From the docs (for 0.20) for DistributedCache [1] I'm under the
impression that .tgz files will be unzipped,untarred and symlinked
into the
jobs current dir

However, when running the job, this little fragment[2] reveals ( i
have called DistributedCache.createSymlink(config_); just after
adding the cache components)


But having inspected the ls -r of the working directory , I dont see
this happening (only mscipt.sh was symlinked, it was added via

ls -lR
total 12
lrwxrwxrwx 1 mapred mapred   90 Apr 28 22:11 job.jar ->
lrwxrwxrwx 1 mapred mapred  141 Apr 28 22:11 mscript.sh ->
drwxr-xr-x 2 mapred mapred 4096 Apr 28 22:11 tmp
total 0

In summary:

- I added via addCacheFile (mscript.sh)  - symlinked into working directory. OK
- I added a JAR file with some classes I needed - added using
addArchiveToClassPath and this worked too - OK
- I added a tgz file hoping it would be  untarred, unzipped and
symlinked in current folder  (using addCacheArchive) - NOT-OK

Have I missed anything?


[1] http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/filecache/DistributedCache.html
[2] Path[] localArchives DistributedCache.getLocalCacheArchives(context.getConfiguration());
Path[] localFiles DistributedCache.getLocalCacheFiles(context.getConfiguration());
for(Path p : localArchives) System.out.println("Arch="+p);
for(Path p : localFiles) System.out.println("File="+p);