Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> schema string in python UDF


Copy link to this message
-
schema string in python UDF
Small question—the python UDF doc says that "variable names inside a schema string are not used anywhere, they just make the syntax identifiable to the parser"  (https://pig.apache.org/docs/r0.9.0/udf.html#schemafunction).  However, it looks like pig is picking up those field names and keeping them if I don't override them.

For instance if I have a python UDF:

@outputSchema('a:int')
def my_udf(x):
    return 123

And a pig script:

raw = LOAD 'data.txt' USING PigStorage() AS (x:int);
with_udf = FOREACH raw GENERATE my_udfs.my_udf(x);

Running describe on with_udf gives me:

with_udf: {a: int}

Is the doc incorrect there?

Thanks,
Doug