Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> UDF with dependency on external jars & native code


Copy link to this message
-
Re: UDF with dependency on external jars & native code
You can use the MR distributed cache to push the native libs - see -
http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.html#Distribute
dCache

"The DistributedCache can also be used to distribute both jars and native
libraries for use in the map  and/or reduce tasks. The child-jvm always has
its  current working directory added to the java.library.path and
LD_LIBRARY_PATH.  And hence the cached libraries can be loaded via
System.loadLibrary or   System.load  . More details on how to load shared
libraries through  distributed cache are documented at
native_libraries.htm"

So using ­Dmapred.cache.files=<dfs path to file>, in your pig commandline
should work.

Please let us know if this worked for you.

For the jars, you can also use a commandline option -
-Dpig.additional.jars="jar1:jar2.."

(thanks to Pradeep for suggesting this solution)

Thanks,
Thejas

On 7/26/10 9:38 AM, "Kaluskar, Sanjay" <[EMAIL PROTECTED]> wrote:

> I am new to PIG and running into a fairly basic problem. I have a UDF
> which depends on some other 3rd party jars & libraries. I can call the
> UDF from my PIG script either from grunt or by running "java -cp ...
> org.apache.pig.Main <script>" in local mode, when I have the jars on the
> classpath and the libraries on LD_LIBRARY_PATH. But, in mapreduce mode I
> get errors from Hadoop because it doesn't find the classes & libraries.
>
> I saw another thread on this forum, which had a workaround for the jar.
> I can explicitly call register on the dependency, and that seems to fix
> the problem. But, there doesn't seem to be a way of specifying the
> native libraries to PIG such that the map/reduce jobs are set up to
> access them.
>
> I am using PIG 0.5.0. Any help is appreciated!
>
> Thanks,
> -sanjay
>