Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Attach bag for each tuple and pass to UDF


Copy link to this message
-
Attach bag for each tuple and pass to UDF
Serega Sheypak 2013-10-21, 21:21
Hi, I have two relations:
relation *rows* (>10GB)
relation *tinyDictionary* (<1MB)

I want to take each tuple from *rows* and attach *tinyDictionary *to it.
And then pass it to python UDF:

result = FOREACH someRelation GENERATE udf.my_python_udf(single_row_from_*
Rows*, whole*TinyDictionary*);

How can I do that?

There is a solution to do it using DistirbutedCache, but I would like to
avoid to use Java stuff.
Also *TinyDictionary *could be in several files. It would be hard to deal
with it.