-Re: DistributedCache is empty
Vinod Kumar Vavilapalli 2014-01-17, 17:46
What is the version of Hadoop that you are using?
On Jan 16, 2014, at 2:41 PM, Keith Wiley <[EMAIL PROTECTED]> wrote:
> My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to work. Usage on the command line is straight forward, I simply add "-files foo.py,bar.py" right after the class name (where those files are in the current directory I'm running hadoop from, i.e., the local nonHDFS filesystem). The mapper then inspects the file list via DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't see the files, there's nothing there. Likewise, if I attempt to run those python scripts from the mapper using hadoop.util.Shell, the files obviously can't be found.
> That should have worked, so I shouldn't have to rely on the DC methods, but nevertheless, I tried anyway, so in the driver I create a new Configuration, then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus referencing the local nonHDFS file in the current working directory. I then add conf to the job ctor, seems straight forward. Still no dice, the mapper can't see the files, they simply aren't there.
> What on Earth am I doing wrong here?
> Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
> "Luminous beings are we, not this crude matter."
> -- Yoda
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.