Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Sending parameters to a customer load function


Copy link to this message
-
Re: Sending parameters to a customer load function
It seems to me that Alan is only interested in writing a loader which has a
non-default constructor (takes arguments), he doesn't need to create a UDF
which has this property.

Besides SimpleTextLoader, there are a number of examples of this in the Pig
codebase, including HBaseStorage.  My own attempt of a basic no-op loader
that has a non-default constructor also worked fine.

Alan -- can you share the full loader implementation where you're seeing
this issue?  Also, what version of Pig are you using?  Error #2999 makes we
wonder whether you're hitting an uncaught exception elsewhere in your
loader implementation.

Norbert

On Mon, Apr 9, 2012 at 11:35 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Hi Alan,
> when you use a loader:
> A  = load 'stuff' using my.pig.Loader('foo', 'bar');
>
> the loader gets constructed with 'foo', 'bar'', then it gets set up
> (with the various setSignature, prepareToRead, etc, calls), and its
> getNext() gets called repeatedly until there is nothing left to read.
>
> when you use a udf:
> B = foreach A generate my.pig.UDF($0);
>
> pig iterates through relation A and invokes the UDF's exec method on a
> tuple composed of the fields specified in your script -- in this case,
> the first field in each row of A.  If you want to have a non-default
> constructor used to create the UDF instance that will be exec'd on all
> these tuples, you can do this through a "define" call as I described
> earlier.
>
> Loaders (and Storers) are very different from UDFs in how they are
> used and invoked, and they implement totally different interfaces.
>
> -Dmitriy
>
> On Mon, Apr 9, 2012 at 5:34 AM, Walker, Alan <[EMAIL PROTECTED]>
> wrote:
> > Dmitriy,
> >
> > I have also tried that pattern for a Loader and it doesn't find the
> String constructor, it only works with the void constructor.
> >
> > grunt> define myreader com.sabre.pigshop.ShoppingReader('all');
> > grunt> A = LOAD '/user/alanw/*.xml' USING myreader() AS (x);
> > 2012-04-09 07:33:58,502 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2999: Unexpected internal error. could not instantiate
> 'com.sabre.pigshop.ShoppingReader' with arguments '[all]'
> >
> >
> > This works:
> >
> > grunt> define myreader com.sabre.pigshop.ShoppingReader();
> > grunt> A = LOAD '/user/alanw/*.xml' USING myreader AS (x);
> >
> >
> > I haven't dug into the Pig source yet, perhaps the Loader functions are
> treated differently than another UDF?  Seems unlikely.
> >
> > Thanks,
> > Alan
> >
> >
> > -----Original Message-----
> > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]
> > Sent: Friday, April 06, 2012 6:21 PM
> > To: [EMAIL PROTECTED]; Walker, Alan
> > Subject: Re: Sending parameters to a customer load function
> >
> > Hi Alan,
> > You can use "define" to supply an argument to a UDF constructor.
> >
> > You can see an example here:
> >
> http://ofps.oreilly.com/titles/9781449302641/intro_pig_latin.html#udf_define
> >
> > I did just find to my surprise that this isn't in our documentation..
> > we should add that.
> >
> > D
> >
> > On Fri, Apr 6, 2012 at 1:38 PM, Walker, Alan <[EMAIL PROTECTED]>
> wrote:
> >> Hi,
> >>
> >> I'm having some challenges with a  load function.  It only seems to
> work with a void constructor.  The Java code has a void constructor and a
> String constructor, much like the SimpleTextLoader example.  Any thoughts
> on what might be going wrong?
> >>
> >>    public ShoppingReader() {
> >>        parms = "";
> >>    }
> >>
> >>    public ShoppingReader(String tmp) {
> >>        parms = tmp;
> >>    }
> >>
> >> grunt> A = LOAD '/user/alanw/*.xml' USING
> com.sabre.pigshop.ShoppingReader('all') AS (x);
> >> 2012-04-06 16:04:08,593 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2999: Unexpected internal error. could not instantiate
> com.sabre.pigshop.ShoppingReader' with arguments '[all]'
> >>
> >> Thanks,
> >> Alan.
> >>
>