Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Pig APIs


+
Prashant Kommireddi 2013-01-22, 01:26
+
Jonathan Coveney 2013-01-22, 01:30
+
Prashant Kommireddi 2013-01-22, 01:47
+
Prashant Kommireddi 2013-01-22, 22:43
+
Jonathan Coveney 2013-01-22, 23:05
+
Bill Graham 2013-01-23, 05:02
+
Prashant Kommireddi 2013-01-23, 06:04
+
Bill Graham 2013-01-23, 06:09
Copy link to this message
-
Re: Pig APIs
Prashant Kommireddi 2013-01-23, 07:25
Thanks Bill.

Sent from my iPhone

On Jan 22, 2013, at 10:10 PM, Bill Graham <[EMAIL PROTECTED]> wrote:

> Sure, here you go:
>
> https://github.com/twitter/ambrose/blob/master/pig/src/main/java/com/twitter/ambrose/pig/AmbrosePigProgressNotificationListener.java
>
>
>
> On Tue, Jan 22, 2013 at 10:04 PM, Prashant Kommireddi
> <[EMAIL PROTECTED]>wrote:
>
>> Thanks Bill. Can you please point me to the Ambrose code that uses PPNL?
>>
>> I will open a JIRA for getting hooks with explain in.
>>
>> Sent from my iPhone
>>
>> On Jan 22, 2013, at 9:03 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
>>
>>> Yeah, getting at the info here is tricky. For Ambrose we're getting info
>>> about submitted jobs, so we can just hook into the lifecycle of
>>> PigProgressNotificationListener. The PPNL notifiers are pretty coupled to
>>> PigStatsUtil and ScriptState, which aren't invoked during explain.
>>>
>>> The bulk of the action for explain all happens in the
>> PigServer.explain(..)
>>> method. That's where the logical plan, physical plan and execution plan
>> are
>>> generated before explain gets called on each to print the output. We
>> could
>>> look to add some sort of listener interface and hook here perhaps that
>> gets
>>> each of these passed during explain via a configured param.
>>>
>>>
>>>
>>> On Tue, Jan 22, 2013 at 3:05 PM, Jonathan Coveney <[EMAIL PROTECTED]
>>> wrote:
>>>
>>>> I think that this is all available, it's just not the easiest thing to
>> get
>>>> at. If you look at the explain plan, it has a lot of this info, and you
>> can
>>>> definitely get at that info. I'm not sure if it has the reducers or if
>>>> that's post MR setup, but you should be able to.
>>>>
>>>> That said, I do not think it would hurt to have hooks in to more clearly
>>>> do something with this info. Bill had to do stuff like this for
>> Ambrose, so
>>>> maybe he can weigh in on what that could look like.
>>>>
>>>>
>>>> 2013/1/22 Prashant Kommireddi <[EMAIL PROTECTED]>
>>>>
>>>>> Jon/others - any pointers on this? I would like to patch in hooks if
>> this
>>>>> is not possible at the moment.
>>>>>
>>>>> -Prashant
>>>>>
>>>>> On Mon, Jan 21, 2013 at 5:47 PM, Prashant Kommireddi <
>> [EMAIL PROTECTED]
>>>>>> wrote:
>>>>>
>>>>>> At the moment, basically info on I/O paths, operators used (group by,
>>>>>> foreach ..), job level info such as number of reducers etc.
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 21, 2013 at 5:30 PM, Jonathan Coveney <[EMAIL PROTECTED]
>>>>>> wrote:
>>>>>>
>>>>>>> What level of information would you like? IE if you do "explain
>>>>> relation,"
>>>>>>> which of the three do you want to hook into?
>>>>>>>
>>>>>>>
>>>>>>> 2013/1/21 Prashant Kommireddi <[EMAIL PROTECTED]>
>>>>>>>
>>>>>>>> Been coding with the APIs and wondering if there is anything that
>>>>> allows
>>>>>>>> you to only retrieve the operators, I/O paths etc without actually
>>>>>>> issuing
>>>>>>>> an execute or a store? Basically, being able to get information
>>>>>>>> post-parsing of the script but pre-execution.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Prashant
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>