I am working on a project that requires to execute a hadoop job remotely
and the job requires some third-part libraries (jar files).
Based on my understanding, I tried:
1. Copy these jar files to hdfs.
2. Copy them into the distributed cache using
DistributedCache.addFileToClassPath so that hadoop can spread these jar
files to each of the slave nodes.
However, my program still throws ClassNotFoundException. Indicating that
some of the classes cannot be found when the job is running.
So I'm wondering:
1. What is the correct way to run a job remotely and programmatically while
the job requires some third-party jar files.
2. I found DistributedCache is deprecated (I'm using hadoop 1.2.0), what is
the alternative class?