Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> UDF with dependency on external jars & native code


Copy link to this message
-
Re: UDF with dependency on external jars & native code

On 8/4/10 3:13 AM, "Kaluskar, Sanjay" <[EMAIL PROTECTED]> wrote:

> The register isn't working after I made some changes to mapred-site.xml.
> Right now I am executing PIG script from the command-line as follows:
>

Do you know what change in mapred-site.xml caused it to stop working ? Is it
after adding mapred.cache.archives ?

 
> PigudfException is an exception defined in one of the jars on the
> classpath of infapig.jar.

Is PigudfException also packaged within the jar ?
-Thejas

> -----Original Message-----
> From: Thejas M Nair [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, July 29, 2010 11:16 PM
> To: [EMAIL PROTECTED]; Kaluskar, Sanjay
> Subject: Re: UDF with dependency on external jars & native code
>
> You can use the MR distributed cache to push the native libs - see -
> http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.html#Distri
> bute
> dCache
>
> "The DistributedCache can also be used to distribute both jars and
> native libraries for use in the map  and/or reduce tasks. The child-jvm
> always has its  current working directory added to the java.library.path
> and LD_LIBRARY_PATH.  And hence the cached libraries can be loaded via
> System.loadLibrary or   System.load  . More details on how to load
> shared
> libraries through  distributed cache are documented at
> native_libraries.htm"
>
> So using -Dmapred.cache.files=<dfs path to file>, in your pig
> commandline should work.
>
> Please let us know if this worked for you.
>
> For the jars, you can also use a commandline option -
> -Dpig.additional.jars="jar1:jar2.."
>
> (thanks to Pradeep for suggesting this solution)
>
> Thanks,
> Thejas
>
> On 7/26/10 9:38 AM, "Kaluskar, Sanjay" <[EMAIL PROTECTED]>
> wrote:
>
>> I am new to PIG and running into a fairly basic problem. I have a UDF
>> which depends on some other 3rd party jars & libraries. I can call the
>> UDF from my PIG script either from grunt or by running "java -cp ...
>> org.apache.pig.Main <script>" in local mode, when I have the jars on
>> the classpath and the libraries on LD_LIBRARY_PATH. But, in mapreduce
>> mode I get errors from Hadoop because it doesn't find the classes &
> libraries.
>>
>> I saw another thread on this forum, which had a workaround for the
> jar.
>> I can explicitly call register on the dependency, and that seems to
>> fix the problem. But, there doesn't seem to be a way of specifying the
>> native libraries to PIG such that the map/reduce jobs are set up to
>> access them.
>>
>> I am using PIG 0.5.0. Any help is appreciated!
>>
>> Thanks,
>> -sanjay
>>
>
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB