Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Apache Pig UDF and  Distributed cache


Copy link to this message
-
Apache Pig UDF and  Distributed cache
Hi All,
I am trying to use Distributed cache in my UDF. I have the following file in HDFS that I want all my map functions to have available locally:
hadoop dfs -ls /scratch/-rw-r--r--   1 userid supergroup    size date time /scratch/id_lookup
In My pig script I pass it as a parameter

ProcessedUI = FOREACH A GENERATE myparser.myUDF(param1, param2, '/scratch/id_lookup');
In my UDF inside exec function I do the following:
 lookup_file = (String)input.get(2);
I have implemented the getCacheFiles as follows:
public List<String> getCacheFiles() {            List<String> list = new ArrayList<String>(1);            list.add(lookup_file + "#id_lookup");            return list;  }
Now I try to read that file using standard io methods.
public void VectorizeData (){                    FileReader fr = new FileReader("./id_lookup");                    BufferedReader brd = new BufferedReader(fr);}

I think I am not using it correctly (may be paths messed up etc.). I get the following exception:
2013-12-11 11:09:50,821 [JobControl] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:userid cause:java.io.FileNotFoundException: File does not exist: null2013-12-11 11:09:51,291 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete2013-12-11 11:09:51,301 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
Any help on this would be great!
     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB