Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Symlinks for cacheArchives


Copy link to this message
-
Symlinks for cacheArchives
Saptarshi Guha 2011-04-29, 05:23
Hello,

>From the docs (for 0.20) for DistributedCache [1] I'm under the
impression that .tgz files will be unzipped,untarred and symlinked
into the
jobs current dir

However, when running the job, this little fragment[2] reveals ( i
have called DistributedCache.createSymlink(config_); just after
adding the cache components)

Arch=/data01/hadoop/mapred/mapred/taskTracker/distcache/5775566659502863353_-129792898_530471609/a.X.com/user/sguha/tmp/rhipe-hbase.jar
Arch=/data01/hadoop/mapred/mapred/taskTracker/distcache/5324957355881422466_25039836_529778096/a.X.com/user/sguha/Rdist.tar.gz
File=/data01/hadoop/mapred/mapred/taskTracker/distcache/1213508244132138160_-278348214_531319237/a.X.com/user/sguha/mscript.sh

But having inspected the ls -r of the working directory , I dont see
this happening (only mscipt.sh was symlinked, it was added via
addCacheFile)

ls -lR
.:
total 12
lrwxrwxrwx 1 mapred mapred   90 Apr 28 22:11 job.jar ->
/data01/hadoop/mapred/mapred/taskTracker/sguha/jobcache/job_201102231451_6814/jars/job.jar
lrwxrwxrwx 1 mapred mapred  141 Apr 28 22:11 mscript.sh ->
/data01/hadoop/mapred/mapred/taskTracker/distcache/1213508244132138160_-278348214_531319237/a.X.com/user/sguha/mscript.sh
drwxr-xr-x 2 mapred mapred 4096 Apr 28 22:11 tmp
./tmp:
total 0

In summary:

- I added via addCacheFile (mscript.sh)  - symlinked into working directory. OK
- I added a JAR file with some classes I needed - added using
addArchiveToClassPath and this worked too - OK
- I added a tgz file hoping it would be  untarred, unzipped and
symlinked in current folder  (using addCacheArchive) - NOT-OK

Have I missed anything?

Cheers
Joy

[1] http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/filecache/DistributedCache.html
[2] Path[] localArchives DistributedCache.getLocalCacheArchives(context.getConfiguration());
Path[] localFiles DistributedCache.getLocalCacheFiles(context.getConfiguration());
for(Path p : localArchives) System.out.println("Arch="+p);
for(Path p : localFiles) System.out.println("File="+p);
+
Saptarshi Guha 2011-04-29, 15:57