Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Distributed Cache in Pig0.7


Copy link to this message
-
Re: Distributed Cache in Pig0.7
Felix,

0.7 does not support distributed cache within Pig UDFs. Is there a reason
you are using such an old version of Pig?

0.9 and later would support this for you. Alan's book has great info on
doing this http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html

Thanks,
Prashant
On Fri, Mar 16, 2012 at 5:32 PM, felix gao <[EMAIL PROTECTED]> wrote:

> I need to put a small shared file on distributed cache so I can load it my
> udf in pig0.7.  We are using Hadoop 0.20.2+228.  I tried to run it using
>
>
> PIG_OPTS="-Dmapred.cache.archives=hdfs://namenode.host:5001/user/gen/categories/exclude/2012-03-15/exclude-categories#excludeCategory
> -Dmapred.create.symlink=yes", runpig ~felix/testingr.pig
> and
>
> PIG_OPTS="-Dmapred.cache.files=hdfs://namenode.host:5001/user/gen/categories/exclude/2012-03-15/exclude-categories#excludeCategory
> -Dmapred.create.symlink=yes", runpig ~felix/testingr.pig
>
>
> when I do
> hadoop fs -ls
>
> hdfs://namenode.host:5001/user/gen/categories/exclude/2012-03-15/exclude-categories
> I do see the file there.
>
> However, on the UDF side I see
> java.io.FileNotFoundException: excludeCategory (No such file or directory)
>    at java.io.FileInputStream.open(Native Method)
>    at java.io.FileInputStream.<init>(FileInputStream.java:106)
>    at java.io.FileInputStream.<init>(FileInputStream.java:66)
>    at java.io.FileReader.<init>(FileReader.java:41)
>
> What did I do wrong?
>