|
Prashant Kommireddi
2013-01-22, 01:26
Jonathan Coveney
2013-01-22, 01:30
Prashant Kommireddi
2013-01-22, 01:47
Prashant Kommireddi
2013-01-22, 22:43
Jonathan Coveney
2013-01-22, 23:05
Bill Graham
2013-01-23, 05:02
Prashant Kommireddi
2013-01-23, 06:04
Bill Graham
2013-01-23, 06:09
Prashant Kommireddi
2013-01-23, 07:25
|
-
Pig APIsPrashant Kommireddi 2013-01-22, 01:26
Been coding with the APIs and wondering if there is anything that allows
you to only retrieve the operators, I/O paths etc without actually issuing an execute or a store? Basically, being able to get information post-parsing of the script but pre-execution. Thanks, Prashant +
Prashant Kommireddi 2013-01-22, 01:26
-
Re: Pig APIsJonathan Coveney 2013-01-22, 01:30
What level of information would you like? IE if you do "explain relation,"
which of the three do you want to hook into? 2013/1/21 Prashant Kommireddi <[EMAIL PROTECTED]> > Been coding with the APIs and wondering if there is anything that allows > you to only retrieve the operators, I/O paths etc without actually issuing > an execute or a store? Basically, being able to get information > post-parsing of the script but pre-execution. > > Thanks, > Prashant > +
Jonathan Coveney 2013-01-22, 01:30
-
Re: Pig APIsPrashant Kommireddi 2013-01-22, 01:47
At the moment, basically info on I/O paths, operators used (group by,
foreach ..), job level info such as number of reducers etc. On Mon, Jan 21, 2013 at 5:30 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote: > What level of information would you like? IE if you do "explain relation," > which of the three do you want to hook into? > > > 2013/1/21 Prashant Kommireddi <[EMAIL PROTECTED]> > > > Been coding with the APIs and wondering if there is anything that allows > > you to only retrieve the operators, I/O paths etc without actually > issuing > > an execute or a store? Basically, being able to get information > > post-parsing of the script but pre-execution. > > > > Thanks, > > Prashant > > > +
Prashant Kommireddi 2013-01-22, 01:47
-
Re: Pig APIsPrashant Kommireddi 2013-01-22, 22:43
Jon/others - any pointers on this? I would like to patch in hooks if this
is not possible at the moment. -Prashant On Mon, Jan 21, 2013 at 5:47 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote: > At the moment, basically info on I/O paths, operators used (group by, > foreach ..), job level info such as number of reducers etc. > > > On Mon, Jan 21, 2013 at 5:30 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote: > >> What level of information would you like? IE if you do "explain relation," >> which of the three do you want to hook into? >> >> >> 2013/1/21 Prashant Kommireddi <[EMAIL PROTECTED]> >> >> > Been coding with the APIs and wondering if there is anything that allows >> > you to only retrieve the operators, I/O paths etc without actually >> issuing >> > an execute or a store? Basically, being able to get information >> > post-parsing of the script but pre-execution. >> > >> > Thanks, >> > Prashant >> > >> > > +
Prashant Kommireddi 2013-01-22, 22:43
-
Re: Pig APIsJonathan Coveney 2013-01-22, 23:05
I think that this is all available, it's just not the easiest thing to get
at. If you look at the explain plan, it has a lot of this info, and you can definitely get at that info. I'm not sure if it has the reducers or if that's post MR setup, but you should be able to. That said, I do not think it would hurt to have hooks in to more clearly do something with this info. Bill had to do stuff like this for Ambrose, so maybe he can weigh in on what that could look like. 2013/1/22 Prashant Kommireddi <[EMAIL PROTECTED]> > Jon/others - any pointers on this? I would like to patch in hooks if this > is not possible at the moment. > > -Prashant > > On Mon, Jan 21, 2013 at 5:47 PM, Prashant Kommireddi <[EMAIL PROTECTED] > >wrote: > > > At the moment, basically info on I/O paths, operators used (group by, > > foreach ..), job level info such as number of reducers etc. > > > > > > On Mon, Jan 21, 2013 at 5:30 PM, Jonathan Coveney <[EMAIL PROTECTED] > >wrote: > > > >> What level of information would you like? IE if you do "explain > relation," > >> which of the three do you want to hook into? > >> > >> > >> 2013/1/21 Prashant Kommireddi <[EMAIL PROTECTED]> > >> > >> > Been coding with the APIs and wondering if there is anything that > allows > >> > you to only retrieve the operators, I/O paths etc without actually > >> issuing > >> > an execute or a store? Basically, being able to get information > >> > post-parsing of the script but pre-execution. > >> > > >> > Thanks, > >> > Prashant > >> > > >> > > > > > +
Jonathan Coveney 2013-01-22, 23:05
-
Re: Pig APIsBill Graham 2013-01-23, 05:02
Yeah, getting at the info here is tricky. For Ambrose we're getting info
about submitted jobs, so we can just hook into the lifecycle of PigProgressNotificationListener. The PPNL notifiers are pretty coupled to PigStatsUtil and ScriptState, which aren't invoked during explain. The bulk of the action for explain all happens in the PigServer.explain(..) method. That's where the logical plan, physical plan and execution plan are generated before explain gets called on each to print the output. We could look to add some sort of listener interface and hook here perhaps that gets each of these passed during explain via a configured param. On Tue, Jan 22, 2013 at 3:05 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote: > I think that this is all available, it's just not the easiest thing to get > at. If you look at the explain plan, it has a lot of this info, and you can > definitely get at that info. I'm not sure if it has the reducers or if > that's post MR setup, but you should be able to. > > That said, I do not think it would hurt to have hooks in to more clearly > do something with this info. Bill had to do stuff like this for Ambrose, so > maybe he can weigh in on what that could look like. > > > 2013/1/22 Prashant Kommireddi <[EMAIL PROTECTED]> > >> Jon/others - any pointers on this? I would like to patch in hooks if this >> is not possible at the moment. >> >> -Prashant >> >> On Mon, Jan 21, 2013 at 5:47 PM, Prashant Kommireddi <[EMAIL PROTECTED] >> >wrote: >> >> > At the moment, basically info on I/O paths, operators used (group by, >> > foreach ..), job level info such as number of reducers etc. >> > >> > >> > On Mon, Jan 21, 2013 at 5:30 PM, Jonathan Coveney <[EMAIL PROTECTED] >> >wrote: >> > >> >> What level of information would you like? IE if you do "explain >> relation," >> >> which of the three do you want to hook into? >> >> >> >> >> >> 2013/1/21 Prashant Kommireddi <[EMAIL PROTECTED]> >> >> >> >> > Been coding with the APIs and wondering if there is anything that >> allows >> >> > you to only retrieve the operators, I/O paths etc without actually >> >> issuing >> >> > an execute or a store? Basically, being able to get information >> >> > post-parsing of the script but pre-execution. >> >> > >> >> > Thanks, >> >> > Prashant >> >> > >> >> >> > >> > >> > > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [EMAIL PROTECTED] going forward.* +
Bill Graham 2013-01-23, 05:02
-
Re: Pig APIsPrashant Kommireddi 2013-01-23, 06:04
Thanks Bill. Can you please point me to the Ambrose code that uses PPNL?
I will open a JIRA for getting hooks with explain in. Sent from my iPhone On Jan 22, 2013, at 9:03 PM, Bill Graham <[EMAIL PROTECTED]> wrote: > Yeah, getting at the info here is tricky. For Ambrose we're getting info > about submitted jobs, so we can just hook into the lifecycle of > PigProgressNotificationListener. The PPNL notifiers are pretty coupled to > PigStatsUtil and ScriptState, which aren't invoked during explain. > > The bulk of the action for explain all happens in the PigServer.explain(..) > method. That's where the logical plan, physical plan and execution plan are > generated before explain gets called on each to print the output. We could > look to add some sort of listener interface and hook here perhaps that gets > each of these passed during explain via a configured param. > > > > On Tue, Jan 22, 2013 at 3:05 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote: > >> I think that this is all available, it's just not the easiest thing to get >> at. If you look at the explain plan, it has a lot of this info, and you can >> definitely get at that info. I'm not sure if it has the reducers or if >> that's post MR setup, but you should be able to. >> >> That said, I do not think it would hurt to have hooks in to more clearly >> do something with this info. Bill had to do stuff like this for Ambrose, so >> maybe he can weigh in on what that could look like. >> >> >> 2013/1/22 Prashant Kommireddi <[EMAIL PROTECTED]> >> >>> Jon/others - any pointers on this? I would like to patch in hooks if this >>> is not possible at the moment. >>> >>> -Prashant >>> >>> On Mon, Jan 21, 2013 at 5:47 PM, Prashant Kommireddi <[EMAIL PROTECTED] >>>> wrote: >>> >>>> At the moment, basically info on I/O paths, operators used (group by, >>>> foreach ..), job level info such as number of reducers etc. >>>> >>>> >>>> On Mon, Jan 21, 2013 at 5:30 PM, Jonathan Coveney <[EMAIL PROTECTED] >>>> wrote: >>>> >>>>> What level of information would you like? IE if you do "explain >>> relation," >>>>> which of the three do you want to hook into? >>>>> >>>>> >>>>> 2013/1/21 Prashant Kommireddi <[EMAIL PROTECTED]> >>>>> >>>>>> Been coding with the APIs and wondering if there is anything that >>> allows >>>>>> you to only retrieve the operators, I/O paths etc without actually >>>>> issuing >>>>>> an execute or a store? Basically, being able to get information >>>>>> post-parsing of the script but pre-execution. >>>>>> >>>>>> Thanks, >>>>>> Prashant >>>>>> >>>>> >>>> >>>> >>> >> >> > > > -- > *Note that I'm no longer using my Yahoo! email address. Please email me at > [EMAIL PROTECTED] going forward.* +
Prashant Kommireddi 2013-01-23, 06:04
-
Re: Pig APIsBill Graham 2013-01-23, 06:09
Sure, here you go:
https://github.com/twitter/ambrose/blob/master/pig/src/main/java/com/twitter/ambrose/pig/AmbrosePigProgressNotificationListener.java On Tue, Jan 22, 2013 at 10:04 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote: > Thanks Bill. Can you please point me to the Ambrose code that uses PPNL? > > I will open a JIRA for getting hooks with explain in. > > Sent from my iPhone > > On Jan 22, 2013, at 9:03 PM, Bill Graham <[EMAIL PROTECTED]> wrote: > > > Yeah, getting at the info here is tricky. For Ambrose we're getting info > > about submitted jobs, so we can just hook into the lifecycle of > > PigProgressNotificationListener. The PPNL notifiers are pretty coupled to > > PigStatsUtil and ScriptState, which aren't invoked during explain. > > > > The bulk of the action for explain all happens in the > PigServer.explain(..) > > method. That's where the logical plan, physical plan and execution plan > are > > generated before explain gets called on each to print the output. We > could > > look to add some sort of listener interface and hook here perhaps that > gets > > each of these passed during explain via a configured param. > > > > > > > > On Tue, Jan 22, 2013 at 3:05 PM, Jonathan Coveney <[EMAIL PROTECTED] > >wrote: > > > >> I think that this is all available, it's just not the easiest thing to > get > >> at. If you look at the explain plan, it has a lot of this info, and you > can > >> definitely get at that info. I'm not sure if it has the reducers or if > >> that's post MR setup, but you should be able to. > >> > >> That said, I do not think it would hurt to have hooks in to more clearly > >> do something with this info. Bill had to do stuff like this for > Ambrose, so > >> maybe he can weigh in on what that could look like. > >> > >> > >> 2013/1/22 Prashant Kommireddi <[EMAIL PROTECTED]> > >> > >>> Jon/others - any pointers on this? I would like to patch in hooks if > this > >>> is not possible at the moment. > >>> > >>> -Prashant > >>> > >>> On Mon, Jan 21, 2013 at 5:47 PM, Prashant Kommireddi < > [EMAIL PROTECTED] > >>>> wrote: > >>> > >>>> At the moment, basically info on I/O paths, operators used (group by, > >>>> foreach ..), job level info such as number of reducers etc. > >>>> > >>>> > >>>> On Mon, Jan 21, 2013 at 5:30 PM, Jonathan Coveney <[EMAIL PROTECTED] > >>>> wrote: > >>>> > >>>>> What level of information would you like? IE if you do "explain > >>> relation," > >>>>> which of the three do you want to hook into? > >>>>> > >>>>> > >>>>> 2013/1/21 Prashant Kommireddi <[EMAIL PROTECTED]> > >>>>> > >>>>>> Been coding with the APIs and wondering if there is anything that > >>> allows > >>>>>> you to only retrieve the operators, I/O paths etc without actually > >>>>> issuing > >>>>>> an execute or a store? Basically, being able to get information > >>>>>> post-parsing of the script but pre-execution. > >>>>>> > >>>>>> Thanks, > >>>>>> Prashant > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > >> > > > > > > -- > +
Bill Graham 2013-01-23, 06:09
-
Re: Pig APIsPrashant Kommireddi 2013-01-23, 07:25
Thanks Bill.
Sent from my iPhone On Jan 22, 2013, at 10:10 PM, Bill Graham <[EMAIL PROTECTED]> wrote: > Sure, here you go: > > https://github.com/twitter/ambrose/blob/master/pig/src/main/java/com/twitter/ambrose/pig/AmbrosePigProgressNotificationListener.java > > > > On Tue, Jan 22, 2013 at 10:04 PM, Prashant Kommireddi > <[EMAIL PROTECTED]>wrote: > >> Thanks Bill. Can you please point me to the Ambrose code that uses PPNL? >> >> I will open a JIRA for getting hooks with explain in. >> >> Sent from my iPhone >> >> On Jan 22, 2013, at 9:03 PM, Bill Graham <[EMAIL PROTECTED]> wrote: >> >>> Yeah, getting at the info here is tricky. For Ambrose we're getting info >>> about submitted jobs, so we can just hook into the lifecycle of >>> PigProgressNotificationListener. The PPNL notifiers are pretty coupled to >>> PigStatsUtil and ScriptState, which aren't invoked during explain. >>> >>> The bulk of the action for explain all happens in the >> PigServer.explain(..) >>> method. That's where the logical plan, physical plan and execution plan >> are >>> generated before explain gets called on each to print the output. We >> could >>> look to add some sort of listener interface and hook here perhaps that >> gets >>> each of these passed during explain via a configured param. >>> >>> >>> >>> On Tue, Jan 22, 2013 at 3:05 PM, Jonathan Coveney <[EMAIL PROTECTED] >>> wrote: >>> >>>> I think that this is all available, it's just not the easiest thing to >> get >>>> at. If you look at the explain plan, it has a lot of this info, and you >> can >>>> definitely get at that info. I'm not sure if it has the reducers or if >>>> that's post MR setup, but you should be able to. >>>> >>>> That said, I do not think it would hurt to have hooks in to more clearly >>>> do something with this info. Bill had to do stuff like this for >> Ambrose, so >>>> maybe he can weigh in on what that could look like. >>>> >>>> >>>> 2013/1/22 Prashant Kommireddi <[EMAIL PROTECTED]> >>>> >>>>> Jon/others - any pointers on this? I would like to patch in hooks if >> this >>>>> is not possible at the moment. >>>>> >>>>> -Prashant >>>>> >>>>> On Mon, Jan 21, 2013 at 5:47 PM, Prashant Kommireddi < >> [EMAIL PROTECTED] >>>>>> wrote: >>>>> >>>>>> At the moment, basically info on I/O paths, operators used (group by, >>>>>> foreach ..), job level info such as number of reducers etc. >>>>>> >>>>>> >>>>>> On Mon, Jan 21, 2013 at 5:30 PM, Jonathan Coveney <[EMAIL PROTECTED] >>>>>> wrote: >>>>>> >>>>>>> What level of information would you like? IE if you do "explain >>>>> relation," >>>>>>> which of the three do you want to hook into? >>>>>>> >>>>>>> >>>>>>> 2013/1/21 Prashant Kommireddi <[EMAIL PROTECTED]> >>>>>>> >>>>>>>> Been coding with the APIs and wondering if there is anything that >>>>> allows >>>>>>>> you to only retrieve the operators, I/O paths etc without actually >>>>>>> issuing >>>>>>>> an execute or a store? Basically, being able to get information >>>>>>>> post-parsing of the script but pre-execution. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Prashant >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> >>> -- >> +
Prashant Kommireddi 2013-01-23, 07:25
|