Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Accessing tuple field names from within a python udf


+
Martin Goodson 2012-11-14, 16:17
+
Martin Goodson 2012-11-14, 17:06
+
Jonathan Coveney 2012-11-15, 20:20
+
Martin Goodson 2012-11-16, 10:21
Copy link to this message
-
Re: Accessing tuple field names from within a python udf
Jonathan Coveney 2012-11-16, 19:21
In the java interface, there is a getInputSchema() method. You could make
this available in the python side of things. This would be a useful
addition.
2012/11/16 Martin Goodson <[EMAIL PROTECTED]>

> Unfortunately I've realised that boundscript.describe doesn't return a
> string. It returns void but prints to stdout. This means I have to go
> through a rather painful process of calling a separate python process that
> calls boundscript.describe and then capture the stdout of that process in
> order to obtain the schema. I don't know why it doesn't return a string.
> Maybe there is an easier way I am missing here. If people have any ideas
> for  a more elegant solution I would be happy to contribute develop it and
> contribute the code.
>
> Martin
>
>
>
>
>
>
>
> On 15 November 2012 20:20, Jonathan Coveney <[EMAIL PROTECTED]> wrote:
>
> > Martin,
> >
> > That is a reasonable workaround. Even in java UDF's, you can't directly
> > access fields by name. Tuples are indexed only by numbers. Using the
> Schema
> > is how I would do it.
> >
> >
> > 2012/11/14 Martin Goodson <[EMAIL PROTECTED]>
> >
> > > Sorry to reply to my question post but I've found a workaround that I
> > > thought I should put here:
> > >
> > > use embedded pig
> > > access the schema with boundscript.describe().
> > > input the schema as a parameter into the udf call.
> > >
> > > Thanks
> > > Martin
> > >
> > >
> > >
> > >
> > > On 14 November 2012 16:17, Martin Goodson <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > I normally deal with very large tuples with many fields. Its a pain
> to
> > > > deal with these in python udfs since I can't figure out a way to
> input
> > > > schemas into the udf. I have to hard code the column number in the
> > UDFs,
> > > > which is a maintenance nightmare.
> > > >
> > > > It seems that java UDFs receive the full tuple in their exec methods
> so
> > > > that the correct fields can be identified, whereas python UDFs only
> > > receive
> > > > lists objects (with field names stripped). Is there any way to get
> the
> > > > behaviour of python UDFs to conform to the java behaviour?
> > > >
> > > >
> > > > Thanks for any ideas
> > > > Martin
> > > >
> > > >
> > >
> >
>