Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Counters from Python UDF


Copy link to this message
-
Re: Counters from Python UDF
That's good to know! Thanks for following up with that, Will. I guess
there's no reason "incrCounter" can't be static.

2012/8/24 Duckworth, Will <[EMAIL PROTECTED]>

> Code below works against trunk.
>
> Apache Pig version 0.11.0-SNAPSHOT (r1372967)
> compiled Aug 14 2012, 15:31:10
>
> pig -f test_counter.pig -p in_path=/path/to/file/test_file.gz -p
> job_name=counter_test
>
> *** test_counter.py
> from org.apache.pig.tools.counters import PigCounterHelper
>
> @outputSchema("line:chararray")
> def testCounter(line):
>         counter = PigCounterHelper()
>         counter.incrCounter("Test","udfcounter",1)
>         return line
>
> *** test_counter.pig
> -- $in_path
> -- $job_name
>
> SET job.name '$job_name';
>
> REGISTER '/path/to/python_file/test_counter.py' USING jython AS udf;
>
> A = load '$in_path' using PigStorage('\n') as (line:chararray);
>
> A2 = foreach A generate udf.testCounter(line) as line;
> A3 = limit A2 10;
> dump A3;
>
>
>
>
> Will Duckworth  Senior Vice President, Software Engineering  | comScore,
> Inc.(NASDAQ:SCOR)
> o +1 (703) 438-2108 | m +1 (301) 606-2977 | mailto:[EMAIL PROTECTED]
>
> .....................................................................................................
> -----Original Message-----
> From: Jonathan Coveney [mailto:[EMAIL PROTECTED]]
> Sent: Friday, August 24, 2012 1:31 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Counters from Python UDF
>
> I think adding a method to jython/jruby is absolutely the way to go
>
> 2012/8/24 Aniket Mokashi <[EMAIL PROTECTED]>
>
> > I used following in my python udf (on pig 0.9) after referring to -
> >
> > http://squarecog.wordpress.com/2010/12/24/incrementing-hadoop-counters
> > -in-apache-pig/
> >
> >
> > from org.apache.pig.tools.pigstats import PigStatusReporter reporter > > PigStatusReporter.getInstance();
> >
> > But, looks like, context is not set in pigreporter when udf is
> > invoked, so it fails. I think we need some caching logic similar to
> > PigCountersHelper, until something sets the context in
> > PigCountersHelper. I wonder how this works.
> >
> > We can add a helper udf at JythonScriptingEngine.init (or some such)
> > method to expose these elegantly. Thoughts?
> >
> > ~Aniket
> >
> > On Thu, Aug 23, 2012 at 2:43 PM, Jonathan Coveney <[EMAIL PROTECTED]
> > >wrote:
> >
> > > In trunk this should be possible (it's possible in 0.10 as well, I
> > > just
> > am
> > > not sure if PigCountersHelper is there). Either way, take a look at
> > > PigCountersHelper. All you have to do is instantiate a copy in your
> > > UDF
> > and
> > > use it from there.
> > >
> > > This hinges on all of the static stuff that Pig relies on working...
> > > I think that the way that we invoke these scripting languages should
> > > work, but this will verify that :)
> > >
> > > 2012/8/23 Duckworth, Will <[EMAIL PROTECTED]>
> > >
> > > > This may be a better question for the DEV list but ... Is it even
> > > possible
> > > > / feasible.  Could it be done by calling the Java classes from
> > > > within Jython?
> > > >
> > > > I guess I would ask the same about algebraic and accumulator UDF
> > > > which
> > I
> > > > know are available in Ruby.
> > > >
> > > > -----Original Message-----
> > > > From: Aniket Mokashi [mailto:[EMAIL PROTECTED]]
> > > > Sent: Friday, August 17, 2012 5:54 PM
> > > > To: [EMAIL PROTECTED]
> > > > Subject: Re: Counters from Python UDF
> > > >
> > > > I dont think there is a way at this point. You may have to open a
> jira.
> > > >
> > > > Thanks,
> > > > Aniket
> > > >
> > > > On Fri, Aug 17, 2012 at 7:03 AM, Duckworth, Will <
> > > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Has anyone poked around to see if there is there a way to create
> > > > > / increment counters from a Python UDFs?  Thanks.
> > > > >
> > > > >
> > > > >
> > > > > Will Duckworth Senior Vice President, Software Engineering |
> > comScore,
> > > > > Inc. (NASDAQ:SCOR)
> > > > >
> > > > > o +1 (703) 438-2108 | m +1 (301) 606-2977 |
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB