Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> null:: prefix for field names and a WARN when registering a Python UDF


Copy link to this message
-
Re: null:: prefix for field names and a WARN when registering a Python UDF
Thanks, Cheolsoo. The bigger problem right now is JsonStorage() blows up
trying to write .pig_schema so I've had to just use PigStorage() and parse
it later so the fieldnames with null:: are not a problem.
/a
On Sun, Jun 9, 2013 at 2:58 PM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Hi Alan,
>
> >> When I register this UDF an unexpected warning pops up which I'm going
> to ignore for now (unless someone says this is important):
>
> Yes, you can usually ignore them except ERROR messages. If these messages
> annoy you a lot, you can redirect stderr to a file (i.e. 2>errors.txt).
>
>
> >> The other strange thing is *null::* gets prepended to each field name.
> This is mostly annoying, and, in the case of JsonStorage(), clutters things
> unnecessarily. Is there a way to resolve this?
>
> The reason why "null::" is prepended is because a python udf returns a
> tuple, but the tuple is not given a name. So if you change the outputSchema
> of your udf to something like this:
>
> @outputSchema("t:( < field schemas here > )")
>
> You will see "t::" is prepended instead.
>
> You can also remove the prefix by adding another FOREACH and re-define
> names using AS clauses for every field. That is,
>
> aprs = FOREACH raw GENERATE FLATTEN(myudf.aprs(line));
> aprs_cleaned = FOREACH aprs GENERATE time AS time, from_call AS from_call,
> <and other fields>;
>
> This is somewhat annoying if there are a lot of fields like your example.
> In fact, there is a jira to add a built-in UDF that removes the prefixes:
> https://issues.apache.org/jira/browse/PIG-3088. I will probably rebase the
> patch and get it committed.
>
> Thanks,
> Cheolsoo
>
>
>
>
>
>
> On Sat, Jun 8, 2013 at 12:01 PM, Alan Crosswell <[EMAIL PROTECTED]> wrote:
>
> > Hello,
> >
> > I'm new to Pig and am having a few small problems that I'd appreciate
> some
> > help with. I'm using Pig-0.11.1 after 0.9.2 just plain didn't work right
> > with my Python UDF.
> >
> > I am using a Python UDF that has two functions with the following
> > outputSchema:
> >
> >
> >
> @outputSchema("time:chararray,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray")
> > def aprs(l):
> >   ...
> >
> > and
> >
> >
> >
> @outputSchema("latitude:double,longitude:double,ambiguity:double,course:double,speed:double")
> > def position(to_call,info):
> >   ...
> >
> > When I register this UDF an unexpected warning pops up which I'm going to
> > ignore for now (unless someone says this is important):
> >
> > grunt> *Register 's3n://n2ygk/aprspig.py' using jython as myudf;*
> > 2013-06-08 18:38:03,990 [main] INFO
> >  org.apache.hadoop.fs.s3native.NativeS3FileSystem - Opening
> > 's3n://n2ygk/aprspig.py' for reading
> > 2013-06-08 18:38:04,118 [main] INFO
> >  org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop
> library
> > 2013-06-08 18:38:04,175 [main] INFO
> >  org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> > python.cachedir=/tmp/pig_jython_6851471253258374122
> > 2013-06-08 18:38:08,576 [main] WARN
> >  org.apache.pig.scripting.jython.JythonScriptEngine -
> > pig.cmd.args.remainders is empty. This is not expected unless on testing.
> > 2013-06-08 18:38:11,981 [main] INFO
> >  org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
> > UDF: myudf.position
> > 2013-06-08 18:38:11,984 [main] INFO
> >  org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
> > UDF: myudf.aprs
> >
> > The other strange thing is *null::* gets prepended to each field name.
> This
> > is mostly annoying, and, in the case of JsonStorage(), clutters
> > things unnecessarily. Is there a way to resolve this?
> >
> > grunt> *aprs = FOREACH raw GENERATE FLATTEN(myudf.aprs(line));*
> > 2013-06-08 01:06:37,324 [main] INFO
> >  org.apache.pig.scripting.jython.JythonFunction - Schema 'time:chararra
> >
> >
> y,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray'