Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Attach bag for each tuple and pass to UDF


Copy link to this message
-
Attach bag for each tuple and pass to UDF
Hi, I have two relations:
relation *rows* (>10GB)
relation *tinyDictionary* (<1MB)

I want to take each tuple from *rows* and attach *tinyDictionary *to it.
And then pass it to python UDF:

result = FOREACH someRelation GENERATE udf.my_python_udf(single_row_from_*
Rows*, whole*TinyDictionary*);

How can I do that?

There is a solution to do it using DistirbutedCache, but I would like to
avoid to use Java stuff.
Also *TinyDictionary *could be in several files. It would be hard to deal
with it.
+
Daniel Dai 2013-10-23, 21:09
+
Pradeep Gollakota 2013-10-23, 22:32