Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> schema string in python UDF


Copy link to this message
-
schema string in python UDF
Small question—the python UDF doc says that "variable names inside a schema string are not used anywhere, they just make the syntax identifiable to the parser"  (https://pig.apache.org/docs/r0.9.0/udf.html#schemafunction).  However, it looks like pig is picking up those field names and keeping them if I don't override them.

For instance if I have a python UDF:

@outputSchema('a:int')
def my_udf(x):
    return 123

And a pig script:

raw = LOAD 'data.txt' USING PigStorage() AS (x:int);
with_udf = FOREACH raw GENERATE my_udfs.my_udf(x);

Running describe on with_udf gives me:

with_udf: {a: int}

Is the doc incorrect there?

Thanks,
Doug
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB