Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> null:: prefix for field names and a WARN when registering a Python UDF


Copy link to this message
-
null:: prefix for field names and a WARN when registering a Python UDF
Hello,

I'm new to Pig and am having a few small problems that I'd appreciate some
help with. I'm using Pig-0.11.1 after 0.9.2 just plain didn't work right
with my Python UDF.

I am using a Python UDF that has two functions with the following
outputSchema:

@outputSchema("time:chararray,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray")
def aprs(l):
  ...

and

@outputSchema("latitude:double,longitude:double,ambiguity:double,course:double,speed:double")
def position(to_call,info):
  ...

When I register this UDF an unexpected warning pops up which I'm going to
ignore for now (unless someone says this is important):

grunt> *Register 's3n://n2ygk/aprspig.py' using jython as myudf;*
2013-06-08 18:38:03,990 [main] INFO
 org.apache.hadoop.fs.s3native.NativeS3FileSystem - Opening
's3n://n2ygk/aprspig.py' for reading
2013-06-08 18:38:04,118 [main] INFO
 org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2013-06-08 18:38:04,175 [main] INFO
 org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
python.cachedir=/tmp/pig_jython_6851471253258374122
2013-06-08 18:38:08,576 [main] WARN
 org.apache.pig.scripting.jython.JythonScriptEngine -
pig.cmd.args.remainders is empty. This is not expected unless on testing.
2013-06-08 18:38:11,981 [main] INFO
 org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
UDF: myudf.position
2013-06-08 18:38:11,984 [main] INFO
 org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
UDF: myudf.aprs

The other strange thing is *null::* gets prepended to each field name. This
is mostly annoying, and, in the case of JsonStorage(), clutters
things unnecessarily. Is there a way to resolve this?

grunt> *aprs = FOREACH raw GENERATE FLATTEN(myudf.aprs(line));*
2013-06-08 01:06:37,324 [main] INFO
 org.apache.pig.scripting.jython.JythonFunction - Schema 'time:chararra
y,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray'
defined for func aprs
grunt> *DESCRIBE aprs;*
aprs: {null::time: chararray,null::from_call: chararray,null::to_call:
chararray,null::digis: chararray,null::gtype: chararray,null::gate:
chararray,null::info: chararray,null::firsthop: chararray}

Is my UDF being defined or invoked incorrectly to result in the null:: or
is this just a feature?

This is just annoying but I'd appreciate any pointers on how to make it go
away.

Thanks.
/a
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB