Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Counters from Python UDF


Copy link to this message
-
Re: Counters from Python UDF
That's good to know! Thanks for following up with that, Will. I guess
there's no reason "incrCounter" can't be static.

2012/8/24 Duckworth, Will <[EMAIL PROTECTED]>

> Code below works against trunk.
>
> Apache Pig version 0.11.0-SNAPSHOT (r1372967)
> compiled Aug 14 2012, 15:31:10
>
> pig -f test_counter.pig -p in_path=/path/to/file/test_file.gz -p
> job_name=counter_test
>
> *** test_counter.py
> from org.apache.pig.tools.counters import PigCounterHelper
>
> @outputSchema("line:chararray")
> def testCounter(line):
>         counter = PigCounterHelper()
>         counter.incrCounter("Test","udfcounter",1)
>         return line
>
> *** test_counter.pig
> -- $in_path
> -- $job_name
>
> SET job.name '$job_name';
>
> REGISTER '/path/to/python_file/test_counter.py' USING jython AS udf;
>
> A = load '$in_path' using PigStorage('\n') as (line:chararray);
>
> A2 = foreach A generate udf.testCounter(line) as line;
> A3 = limit A2 10;
> dump A3;
>
>
>
>
> Will Duckworth  Senior Vice President, Software Engineering  | comScore,
> Inc.(NASDAQ:SCOR)
> o +1 (703) 438-2108 | m +1 (301) 606-2977 | mailto:[EMAIL PROTECTED]
>
> .....................................................................................................
> -----Original Message-----
> From: Jonathan Coveney [mailto:[EMAIL PROTECTED]]
> Sent: Friday, August 24, 2012 1:31 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Counters from Python UDF
>
> I think adding a method to jython/jruby is absolutely the way to go
>
> 2012/8/24 Aniket Mokashi <[EMAIL PROTECTED]>
>
> > I used following in my python udf (on pig 0.9) after referring to -
> >
> > http://squarecog.wordpress.com/2010/12/24/incrementing-hadoop-counters
> > -in-apache-pig/
> >
> >
> > from org.apache.pig.tools.pigstats import PigStatusReporter reporter > > PigStatusReporter.getInstance();
> >
> > But, looks like, context is not set in pigreporter when udf is
> > invoked, so it fails. I think we need some caching logic similar to
> > PigCountersHelper, until something sets the context in
> > PigCountersHelper. I wonder how this works.
> >
> > We can add a helper udf at JythonScriptingEngine.init (or some such)
> > method to expose these elegantly. Thoughts?
> >
> > ~Aniket
> >
> > On Thu, Aug 23, 2012 at 2:43 PM, Jonathan Coveney <[EMAIL PROTECTED]
> > >wrote:
> >
> > > In trunk this should be possible (it's possible in 0.10 as well, I
> > > just
> > am
> > > not sure if PigCountersHelper is there). Either way, take a look at
> > > PigCountersHelper. All you have to do is instantiate a copy in your
> > > UDF
> > and
> > > use it from there.
> > >
> > > This hinges on all of the static stuff that Pig relies on working...
> > > I think that the way that we invoke these scripting languages should
> > > work, but this will verify that :)
> > >
> > > 2012/8/23 Duckworth, Will <[EMAIL PROTECTED]>
> > >
> > > > This may be a better question for the DEV list but ... Is it even
> > > possible
> > > > / feasible.  Could it be done by calling the Java classes from
> > > > within Jython?
> > > >
> > > > I guess I would ask the same about algebraic and accumulator UDF
> > > > which
> > I
> > > > know are available in Ruby.
> > > >
> > > > -----Original Message-----
> > > > From: Aniket Mokashi [mailto:[EMAIL PROTECTED]]
> > > > Sent: Friday, August 17, 2012 5:54 PM
> > > > To: [EMAIL PROTECTED]
> > > > Subject: Re: Counters from Python UDF
> > > >
> > > > I dont think there is a way at this point. You may have to open a
> jira.
> > > >
> > > > Thanks,
> > > > Aniket
> > > >
> > > > On Fri, Aug 17, 2012 at 7:03 AM, Duckworth, Will <
> > > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Has anyone poked around to see if there is there a way to create
> > > > > / increment counters from a Python UDFs?  Thanks.
> > > > >
> > > > >
> > > > >
> > > > > Will Duckworth Senior Vice President, Software Engineering |
> > comScore,
> > > > > Inc. (NASDAQ:SCOR)
> > > > >
> > > > > o +1 (703) 438-2108 | m +1 (301) 606-2977 |