-Re: streaming cacheArchive shared libraries
Ramya Sunil 2011-08-05, 17:44
I have tried the exact use case you have mentioned and it works fine for me.
Below is the command line for the same:
[ramya]$ jar vxf samplelib.jar
[ramya]$ hadoop dfs -put samplelib.jar samplelib.jar
[ramya]$ hadoop jar hadoop-streaming.jar -input InputDir -mapper "ls
testlink/libhdfs.so" -reducer NONE -output out -cacheArchive
[ramya]$ hadoop dfs -cat out/*
Hope it helps.
On 8/5/11 10:10 AM, "Keith Wiley" <[EMAIL PROTECTED]> wrote:
I can use cacheFile to load .so files into the distributed cache and it
works fine (the streaming executable links against the .so and runs), but I
can't get it to work with -cacheArchive. It always says it can't find the
.so file. I realize that if you jar a directory, the directory will be
recreated when you unjar, but I've tried jaring a file directly. It is
easily verified that unjarring such a file reproduces the original file as a
sibling of the jar file itself. So it seems to me that cacheArchive should
have transferred the jar file to the cwd of my task, unjarred it, and
produced a .so file right there, but it doesn't link up with the executable.
Like I said, I know this basic approach works just fine with cacheFile.
What could be the problem here? I can't easily see the files on the cluster
since it is a remote cluster with limited access. I don't believe I can ssh
to any individual machine to investigate the files that are created for a
task...but I think I have worked through the process logically and I'm not
sure what I'm doing wrong.
Keith Wiley *[EMAIL PROTECTED]* keithwiley.com
"Luminous beings are we, not this crude matter."