Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Passing a BAG to Pig UDF constructor?


Copy link to this message
-
Re: Passing a BAG to Pig UDF constructor?
That's a good idea (to pass the bag to UDF and initialize it on first UDF
invocation). Thanks.

Why do you think it is expensive Mridul?

On Tue, Jun 26, 2012 at 2:50 PM, Mridul Muralidharan
<[EMAIL PROTECTED]>wrote:

>
>
> > -----Original Message-----
> > From: Jonathan Coveney [mailto:[EMAIL PROTECTED]]
> > Sent: Wednesday, June 27, 2012 3:12 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Passing a BAG to Pig UDF constructor?
> >
> > You can also just pass the bag to the UDF, and have a lazy initializer
> > in exec that loads the bag into memory.
>
>
> Can you elaborate what you mean by pass the bag to the UDF ?
> Pass it as part of the input to the udf in exec and initialize it only
> once (first time) ? (If yes, this is expensive)
> Or something else ?
>
>
> Regards,
> Mridul
>
>
>
> >
> > 2012/6/26 Mridul Muralidharan <[EMAIL PROTECTED]>
> >
> > > You could dump the data in a dfs file and pass the location of the
> > > file as param to your udf in define - so that it initializes itself
> > > using that data ...
> > >
> > >
> > > - Mridul
> > >
> > >
> > > > -----Original Message-----
> > > > From: Dexin Wang [mailto:[EMAIL PROTECTED]]
> > > > Sent: Tuesday, June 26, 2012 10:58 PM
> > > > To: [EMAIL PROTECTED]
> > > > Subject: Passing a BAG to Pig UDF constructor?
> > > >
> > > > Is it possible to pass a bag to a Pig UDF constructor?
> > > >
> > > > Basically in the constructor I want to initialize some hash map so
> > > > that on every exec operation, I can use the hashmap to do a lookup
> > > > and find the value I need, and apply some algorithm to it.
> > > >
> > > > I realize I could just do a replicated join to achieve similar
> > > > things but the algorithm is more than a few lines and there are
> > some
> > > > edge cases so I would rather wrap that logic inside a UDF function.
> > > > I also realize I could just pass a file path to the constructor and
> > > > read the files to initialize the hashmap but my files are on
> > > > Amazon's S3 and I don't want to deal with
> > > > S3 API to read the file.
> > > >
> > > > Is this possible or is there some alternative ways to achieve the
> > > > same thing?
> > > >
> > > > Thanks.
> > > > Dexin
> > >
>