Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig APIs


Thanks Bill.

Sent from my iPhone

On Jan 22, 2013, at 10:10 PM, Bill Graham <[EMAIL PROTECTED]> wrote:

> Sure, here you go:
>
> https://github.com/twitter/ambrose/blob/master/pig/src/main/java/com/twitter/ambrose/pig/AmbrosePigProgressNotificationListener.java
>
>
>
> On Tue, Jan 22, 2013 at 10:04 PM, Prashant Kommireddi
> <[EMAIL PROTECTED]>wrote:
>
>> Thanks Bill. Can you please point me to the Ambrose code that uses PPNL?
>>
>> I will open a JIRA for getting hooks with explain in.
>>
>> Sent from my iPhone
>>
>> On Jan 22, 2013, at 9:03 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
>>
>>> Yeah, getting at the info here is tricky. For Ambrose we're getting info
>>> about submitted jobs, so we can just hook into the lifecycle of
>>> PigProgressNotificationListener. The PPNL notifiers are pretty coupled to
>>> PigStatsUtil and ScriptState, which aren't invoked during explain.
>>>
>>> The bulk of the action for explain all happens in the
>> PigServer.explain(..)
>>> method. That's where the logical plan, physical plan and execution plan
>> are
>>> generated before explain gets called on each to print the output. We
>> could
>>> look to add some sort of listener interface and hook here perhaps that
>> gets
>>> each of these passed during explain via a configured param.
>>>
>>>
>>>
>>> On Tue, Jan 22, 2013 at 3:05 PM, Jonathan Coveney <[EMAIL PROTECTED]
>>> wrote:
>>>
>>>> I think that this is all available, it's just not the easiest thing to
>> get
>>>> at. If you look at the explain plan, it has a lot of this info, and you
>> can
>>>> definitely get at that info. I'm not sure if it has the reducers or if
>>>> that's post MR setup, but you should be able to.
>>>>
>>>> That said, I do not think it would hurt to have hooks in to more clearly
>>>> do something with this info. Bill had to do stuff like this for
>> Ambrose, so
>>>> maybe he can weigh in on what that could look like.
>>>>
>>>>
>>>> 2013/1/22 Prashant Kommireddi <[EMAIL PROTECTED]>
>>>>
>>>>> Jon/others - any pointers on this? I would like to patch in hooks if
>> this
>>>>> is not possible at the moment.
>>>>>
>>>>> -Prashant
>>>>>
>>>>> On Mon, Jan 21, 2013 at 5:47 PM, Prashant Kommireddi <
>> [EMAIL PROTECTED]
>>>>>> wrote:
>>>>>
>>>>>> At the moment, basically info on I/O paths, operators used (group by,
>>>>>> foreach ..), job level info such as number of reducers etc.
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 21, 2013 at 5:30 PM, Jonathan Coveney <[EMAIL PROTECTED]
>>>>>> wrote:
>>>>>>
>>>>>>> What level of information would you like? IE if you do "explain
>>>>> relation,"
>>>>>>> which of the three do you want to hook into?
>>>>>>>
>>>>>>>
>>>>>>> 2013/1/21 Prashant Kommireddi <[EMAIL PROTECTED]>
>>>>>>>
>>>>>>>> Been coding with the APIs and wondering if there is anything that
>>>>> allows
>>>>>>>> you to only retrieve the operators, I/O paths etc without actually
>>>>>>> issuing
>>>>>>>> an execute or a store? Basically, being able to get information
>>>>>>>> post-parsing of the script but pre-execution.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Prashant
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB