Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - null:: prefix for field names and a WARN when registering a Python UDF


Copy link to this message
-
null:: prefix for field names and a WARN when registering a Python UDF
Alan Crosswell 2013-06-08, 19:01
Hello,

I'm new to Pig and am having a few small problems that I'd appreciate some
help with. I'm using Pig-0.11.1 after 0.9.2 just plain didn't work right
with my Python UDF.

I am using a Python UDF that has two functions with the following
outputSchema:

@outputSchema("time:chararray,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray")
def aprs(l):
  ...

and

@outputSchema("latitude:double,longitude:double,ambiguity:double,course:double,speed:double")
def position(to_call,info):
  ...

When I register this UDF an unexpected warning pops up which I'm going to
ignore for now (unless someone says this is important):

grunt> *Register 's3n://n2ygk/aprspig.py' using jython as myudf;*
2013-06-08 18:38:03,990 [main] INFO
 org.apache.hadoop.fs.s3native.NativeS3FileSystem - Opening
's3n://n2ygk/aprspig.py' for reading
2013-06-08 18:38:04,118 [main] INFO
 org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2013-06-08 18:38:04,175 [main] INFO
 org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
python.cachedir=/tmp/pig_jython_6851471253258374122
2013-06-08 18:38:08,576 [main] WARN
 org.apache.pig.scripting.jython.JythonScriptEngine -
pig.cmd.args.remainders is empty. This is not expected unless on testing.
2013-06-08 18:38:11,981 [main] INFO
 org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
UDF: myudf.position
2013-06-08 18:38:11,984 [main] INFO
 org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
UDF: myudf.aprs

The other strange thing is *null::* gets prepended to each field name. This
is mostly annoying, and, in the case of JsonStorage(), clutters
things unnecessarily. Is there a way to resolve this?

grunt> *aprs = FOREACH raw GENERATE FLATTEN(myudf.aprs(line));*
2013-06-08 01:06:37,324 [main] INFO
 org.apache.pig.scripting.jython.JythonFunction - Schema 'time:chararra
y,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray'
defined for func aprs
grunt> *DESCRIBE aprs;*
aprs: {null::time: chararray,null::from_call: chararray,null::to_call:
chararray,null::digis: chararray,null::gtype: chararray,null::gate:
chararray,null::info: chararray,null::firsthop: chararray}

Is my UDF being defined or invoked incorrectly to result in the null:: or
is this just a feature?

This is just annoying but I'd appreciate any pointers on how to make it go
away.

Thanks.
/a