Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig UDF question


Copy link to this message
-
Re: Pig UDF question
Thanks for the reference, Yes I am aware of it but I can't use it as is.
For my future references also it would be good for me to know:

1. If I create a static member in UDF class is that one instance per mapper
 task?
 2. Is there a method that gets called at the end of mapper method that I
 can use for cleanup?
On the same subject is it better to index in UDF or storefunc? I am trying
to see how to decide in this case where you are interacting with external
system.

On Tue, May 15, 2012 at 6:03 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> Are you aware of Wonderdog, which already does this?  Unfortunately,
> finding reusable pig components can be very hard, as they exist across
> many proprietary projects.
>
> https://github.com/infochimps/wonderdog
> A post explaining how to use it, end to end, is here:
>
> http://www.quora.com/Autocomplete/What-is-the-best-way-to-implement-an-autocomplete-search-feature-when-dealing-with-large-data-sets
>
> Russell Jurney http://datasyndrome.com
>
> On May 15, 2012, at 4:18 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
>
> > I am trying to write an UDF that indexes data in elasticsearch after
> > converting it to JSON. I had 2 questions:
> >
> > 1. If I create a static member in UDF class is that one instance per
> mapper
> > task?
> > 2. Is there a method that gets called at the end of mapper method that I
> > can use for cleanup?
> >
> > I was wondering if I should rather write a storefunc that would index the
> > data. Need some help here, essentially I need some way to initialize
> search
> > Client once and then at the end close it out.
>