Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> null:: prefix for field names and a WARN when registering a Python UDF


Copy link to this message
-
Re: null:: prefix for field names and a WARN when registering a Python UDF
Hi Alan,

>> When I register this UDF an unexpected warning pops up which I'm going
to ignore for now (unless someone says this is important):

Yes, you can usually ignore them except ERROR messages. If these messages
annoy you a lot, you can redirect stderr to a file (i.e. 2>errors.txt).
>> The other strange thing is *null::* gets prepended to each field name.
This is mostly annoying, and, in the case of JsonStorage(), clutters things
unnecessarily. Is there a way to resolve this?

The reason why "null::" is prepended is because a python udf returns a
tuple, but the tuple is not given a name. So if you change the outputSchema
of your udf to something like this:

@outputSchema("t:( < field schemas here > )")

You will see "t::" is prepended instead.

You can also remove the prefix by adding another FOREACH and re-define
names using AS clauses for every field. That is,

aprs = FOREACH raw GENERATE FLATTEN(myudf.aprs(line));
aprs_cleaned = FOREACH aprs GENERATE time AS time, from_call AS from_call,
<and other fields>;

This is somewhat annoying if there are a lot of fields like your example.
In fact, there is a jira to add a built-in UDF that removes the prefixes:
https://issues.apache.org/jira/browse/PIG-3088. I will probably rebase the
patch and get it committed.

Thanks,
Cheolsoo
On Sat, Jun 8, 2013 at 12:01 PM, Alan Crosswell <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I'm new to Pig and am having a few small problems that I'd appreciate some
> help with. I'm using Pig-0.11.1 after 0.9.2 just plain didn't work right
> with my Python UDF.
>
> I am using a Python UDF that has two functions with the following
> outputSchema:
>
>
> @outputSchema("time:chararray,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray")
> def aprs(l):
>   ...
>
> and
>
>
> @outputSchema("latitude:double,longitude:double,ambiguity:double,course:double,speed:double")
> def position(to_call,info):
>   ...
>
> When I register this UDF an unexpected warning pops up which I'm going to
> ignore for now (unless someone says this is important):
>
> grunt> *Register 's3n://n2ygk/aprspig.py' using jython as myudf;*
> 2013-06-08 18:38:03,990 [main] INFO
>  org.apache.hadoop.fs.s3native.NativeS3FileSystem - Opening
> 's3n://n2ygk/aprspig.py' for reading
> 2013-06-08 18:38:04,118 [main] INFO
>  org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
> 2013-06-08 18:38:04,175 [main] INFO
>  org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> python.cachedir=/tmp/pig_jython_6851471253258374122
> 2013-06-08 18:38:08,576 [main] WARN
>  org.apache.pig.scripting.jython.JythonScriptEngine -
> pig.cmd.args.remainders is empty. This is not expected unless on testing.
> 2013-06-08 18:38:11,981 [main] INFO
>  org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
> UDF: myudf.position
> 2013-06-08 18:38:11,984 [main] INFO
>  org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
> UDF: myudf.aprs
>
> The other strange thing is *null::* gets prepended to each field name. This
> is mostly annoying, and, in the case of JsonStorage(), clutters
> things unnecessarily. Is there a way to resolve this?
>
> grunt> *aprs = FOREACH raw GENERATE FLATTEN(myudf.aprs(line));*
> 2013-06-08 01:06:37,324 [main] INFO
>  org.apache.pig.scripting.jython.JythonFunction - Schema 'time:chararra
>
> y,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray'
> defined for func aprs
> grunt> *DESCRIBE aprs;*
> aprs: {null::time: chararray,null::from_call: chararray,null::to_call:
> chararray,null::digis: chararray,null::gtype: chararray,null::gate:
> chararray,null::info: chararray,null::firsthop: chararray}
>
> Is my UDF being defined or invoked incorrectly to result in the null:: or
> is this just a feature?
>
> This is just annoying but I'd appreciate any pointers on how to make it go
> away.
>
> Thanks.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB