Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> null:: prefix for field names and a WARN when registering a Python UDF


Copy link to this message
-
Re: null:: prefix for field names and a WARN when registering a Python UDF
Thanks, Cheolsoo. The bigger problem right now is JsonStorage() blows up
trying to write .pig_schema so I've had to just use PigStorage() and parse
it later so the fieldnames with null:: are not a problem.
/a
On Sun, Jun 9, 2013 at 2:58 PM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Hi Alan,
>
> >> When I register this UDF an unexpected warning pops up which I'm going
> to ignore for now (unless someone says this is important):
>
> Yes, you can usually ignore them except ERROR messages. If these messages
> annoy you a lot, you can redirect stderr to a file (i.e. 2>errors.txt).
>
>
> >> The other strange thing is *null::* gets prepended to each field name.
> This is mostly annoying, and, in the case of JsonStorage(), clutters things
> unnecessarily. Is there a way to resolve this?
>
> The reason why "null::" is prepended is because a python udf returns a
> tuple, but the tuple is not given a name. So if you change the outputSchema
> of your udf to something like this:
>
> @outputSchema("t:( < field schemas here > )")
>
> You will see "t::" is prepended instead.
>
> You can also remove the prefix by adding another FOREACH and re-define
> names using AS clauses for every field. That is,
>
> aprs = FOREACH raw GENERATE FLATTEN(myudf.aprs(line));
> aprs_cleaned = FOREACH aprs GENERATE time AS time, from_call AS from_call,
> <and other fields>;
>
> This is somewhat annoying if there are a lot of fields like your example.
> In fact, there is a jira to add a built-in UDF that removes the prefixes:
> https://issues.apache.org/jira/browse/PIG-3088. I will probably rebase the
> patch and get it committed.
>
> Thanks,
> Cheolsoo
>
>
>
>
>
>
> On Sat, Jun 8, 2013 at 12:01 PM, Alan Crosswell <[EMAIL PROTECTED]> wrote:
>
> > Hello,
> >
> > I'm new to Pig and am having a few small problems that I'd appreciate
> some
> > help with. I'm using Pig-0.11.1 after 0.9.2 just plain didn't work right
> > with my Python UDF.
> >
> > I am using a Python UDF that has two functions with the following
> > outputSchema:
> >
> >
> >
> @outputSchema("time:chararray,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray")
> > def aprs(l):
> >   ...
> >
> > and
> >
> >
> >
> @outputSchema("latitude:double,longitude:double,ambiguity:double,course:double,speed:double")
> > def position(to_call,info):
> >   ...
> >
> > When I register this UDF an unexpected warning pops up which I'm going to
> > ignore for now (unless someone says this is important):
> >
> > grunt> *Register 's3n://n2ygk/aprspig.py' using jython as myudf;*
> > 2013-06-08 18:38:03,990 [main] INFO
> >  org.apache.hadoop.fs.s3native.NativeS3FileSystem - Opening
> > 's3n://n2ygk/aprspig.py' for reading
> > 2013-06-08 18:38:04,118 [main] INFO
> >  org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop
> library
> > 2013-06-08 18:38:04,175 [main] INFO
> >  org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> > python.cachedir=/tmp/pig_jython_6851471253258374122
> > 2013-06-08 18:38:08,576 [main] WARN
> >  org.apache.pig.scripting.jython.JythonScriptEngine -
> > pig.cmd.args.remainders is empty. This is not expected unless on testing.
> > 2013-06-08 18:38:11,981 [main] INFO
> >  org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
> > UDF: myudf.position
> > 2013-06-08 18:38:11,984 [main] INFO
> >  org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
> > UDF: myudf.aprs
> >
> > The other strange thing is *null::* gets prepended to each field name.
> This
> > is mostly annoying, and, in the case of JsonStorage(), clutters
> > things unnecessarily. Is there a way to resolve this?
> >
> > grunt> *aprs = FOREACH raw GENERATE FLATTEN(myudf.aprs(line));*
> > 2013-06-08 01:06:37,324 [main] INFO
> >  org.apache.pig.scripting.jython.JythonFunction - Schema 'time:chararra
> >
> >
> y,from_call:chararray,to_call:chararray,digis:chararray,gtype:chararray,gate:chararray,info:chararray,firsthop:chararray'
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB