Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Distributed Cache in Pig0.7


Copy link to this message
-
Re: Distributed Cache in Pig0.7
Felix,

0.7 does not support distributed cache within Pig UDFs. Is there a reason
you are using such an old version of Pig?

0.9 and later would support this for you. Alan's book has great info on
doing this http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html

Thanks,
Prashant
On Fri, Mar 16, 2012 at 5:32 PM, felix gao <[EMAIL PROTECTED]> wrote:

> I need to put a small shared file on distributed cache so I can load it my
> udf in pig0.7.  We are using Hadoop 0.20.2+228.  I tried to run it using
>
>
> PIG_OPTS="-Dmapred.cache.archives=hdfs://namenode.host:5001/user/gen/categories/exclude/2012-03-15/exclude-categories#excludeCategory
> -Dmapred.create.symlink=yes", runpig ~felix/testingr.pig
> and
>
> PIG_OPTS="-Dmapred.cache.files=hdfs://namenode.host:5001/user/gen/categories/exclude/2012-03-15/exclude-categories#excludeCategory
> -Dmapred.create.symlink=yes", runpig ~felix/testingr.pig
>
>
> when I do
> hadoop fs -ls
>
> hdfs://namenode.host:5001/user/gen/categories/exclude/2012-03-15/exclude-categories
> I do see the file there.
>
> However, on the UDF side I see
> java.io.FileNotFoundException: excludeCategory (No such file or directory)
>    at java.io.FileInputStream.open(Native Method)
>    at java.io.FileInputStream.<init>(FileInputStream.java:106)
>    at java.io.FileInputStream.<init>(FileInputStream.java:66)
>    at java.io.FileReader.<init>(FileReader.java:41)
>
> What did I do wrong?
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB