Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig APIs


Yeah, getting at the info here is tricky. For Ambrose we're getting info
about submitted jobs, so we can just hook into the lifecycle of
PigProgressNotificationListener. The PPNL notifiers are pretty coupled to
PigStatsUtil and ScriptState, which aren't invoked during explain.

The bulk of the action for explain all happens in the PigServer.explain(..)
method. That's where the logical plan, physical plan and execution plan are
generated before explain gets called on each to print the output. We could
look to add some sort of listener interface and hook here perhaps that gets
each of these passed during explain via a configured param.

On Tue, Jan 22, 2013 at 3:05 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> I think that this is all available, it's just not the easiest thing to get
> at. If you look at the explain plan, it has a lot of this info, and you can
> definitely get at that info. I'm not sure if it has the reducers or if
> that's post MR setup, but you should be able to.
>
> That said, I do not think it would hurt to have hooks in to more clearly
> do something with this info. Bill had to do stuff like this for Ambrose, so
> maybe he can weigh in on what that could look like.
>
>
> 2013/1/22 Prashant Kommireddi <[EMAIL PROTECTED]>
>
>> Jon/others - any pointers on this? I would like to patch in hooks if this
>> is not possible at the moment.
>>
>> -Prashant
>>
>> On Mon, Jan 21, 2013 at 5:47 PM, Prashant Kommireddi <[EMAIL PROTECTED]
>> >wrote:
>>
>> > At the moment, basically info on I/O paths, operators used (group by,
>> > foreach ..), job level info such as number of reducers etc.
>> >
>> >
>> > On Mon, Jan 21, 2013 at 5:30 PM, Jonathan Coveney <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> What level of information would you like? IE if you do "explain
>> relation,"
>> >> which of the three do you want to hook into?
>> >>
>> >>
>> >> 2013/1/21 Prashant Kommireddi <[EMAIL PROTECTED]>
>> >>
>> >> > Been coding with the APIs and wondering if there is anything that
>> allows
>> >> > you to only retrieve the operators, I/O paths etc without actually
>> >> issuing
>> >> > an execute or a store? Basically, being able to get information
>> >> > post-parsing of the script but pre-execution.
>> >> >
>> >> > Thanks,
>> >> > Prashant
>> >> >
>> >>
>> >
>> >
>>
>
>
--
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB