Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig UDF question


Copy link to this message
-
Re: Pig UDF question
Thanks for the reference, Yes I am aware of it but I can't use it as is.
For my future references also it would be good for me to know:

1. If I create a static member in UDF class is that one instance per mapper
 task?
 2. Is there a method that gets called at the end of mapper method that I
 can use for cleanup?
On the same subject is it better to index in UDF or storefunc? I am trying
to see how to decide in this case where you are interacting with external
system.

On Tue, May 15, 2012 at 6:03 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> Are you aware of Wonderdog, which already does this?  Unfortunately,
> finding reusable pig components can be very hard, as they exist across
> many proprietary projects.
>
> https://github.com/infochimps/wonderdog
> A post explaining how to use it, end to end, is here:
>
> http://www.quora.com/Autocomplete/What-is-the-best-way-to-implement-an-autocomplete-search-feature-when-dealing-with-large-data-sets
>
> Russell Jurney http://datasyndrome.com
>
> On May 15, 2012, at 4:18 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
>
> > I am trying to write an UDF that indexes data in elasticsearch after
> > converting it to JSON. I had 2 questions:
> >
> > 1. If I create a static member in UDF class is that one instance per
> mapper
> > task?
> > 2. Is there a method that gets called at the end of mapper method that I
> > can use for cleanup?
> >
> > I was wondering if I should rather write a storefunc that would index the
> > data. Need some help here, essentially I need some way to initialize
> search
> > Client once and then at the end close it out.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB