Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> UDF with dependency on external jars & native code


Copy link to this message
-
Re: UDF with dependency on external jars & native code
You can use the MR distributed cache to push the native libs - see -
http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.html#Distribute
dCache

"The DistributedCache can also be used to distribute both jars and native
libraries for use in the map  and/or reduce tasks. The child-jvm always has
its  current working directory added to the java.library.path and
LD_LIBRARY_PATH.  And hence the cached libraries can be loaded via
System.loadLibrary or   System.load  . More details on how to load shared
libraries through  distributed cache are documented at
native_libraries.htm"

So using ­Dmapred.cache.files=<dfs path to file>, in your pig commandline
should work.

Please let us know if this worked for you.

For the jars, you can also use a commandline option -
-Dpig.additional.jars="jar1:jar2.."

(thanks to Pradeep for suggesting this solution)

Thanks,
Thejas

On 7/26/10 9:38 AM, "Kaluskar, Sanjay" <[EMAIL PROTECTED]> wrote:

> I am new to PIG and running into a fairly basic problem. I have a UDF
> which depends on some other 3rd party jars & libraries. I can call the
> UDF from my PIG script either from grunt or by running "java -cp ...
> org.apache.pig.Main <script>" in local mode, when I have the jars on the
> classpath and the libraries on LD_LIBRARY_PATH. But, in mapreduce mode I
> get errors from Hadoop because it doesn't find the classes & libraries.
>
> I saw another thread on this forum, which had a workaround for the jar.
> I can explicitly call register on the dependency, and that seems to fix
> the problem. But, there doesn't seem to be a way of specifying the
> native libraries to PIG such that the map/reduce jobs are set up to
> access them.
>
> I am using PIG 0.5.0. Any help is appreciated!
>
> Thanks,
> -sanjay
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB