|
Bill Graham
2012-10-10, 17:28
Prashant Kommireddi
2012-10-10, 18:00
Bill Graham
2012-10-11, 00:54
Prashant Kommireddi
2012-10-11, 09:12
Dmitriy Ryaboy
2012-10-11, 19:27
Prashant Kommireddi
2012-10-11, 19:54
Dmitriy Ryaboy
2012-10-11, 21:28
Dmitriy Ryaboy
2012-10-11, 21:28
|
-
Re: PigServer APIBill Graham 2012-10-10, 17:28
Hi Prashant,
[Replying to the dev list to get others take on these...] Just curious, why do you prefer a List of JobStats over the already existing iterator? I hesitate to add one-liner methods if it's something that can be a one-liner my the caller, unless the use case if very common. Making getSuccessfulJobs() and getFailedJobs() public seems reasonable to me. I'm not sure about the rationale behind the differences between registerScript and store(). Store() and registerQuery() are able to manually add to the DAG as statements come in, but register script needs parsing for execution. That's probably why execution is delegated to the GruntParser. The resulting DAG for a single-store script should be the same though. It seems like registerScript() should be able to return a list of ExecJobs. thanks, Bill On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote: > Hi Bill, > > I am looking at PigStats and JobGraph, and am thinking of adding some > functions. Let me know what you think. > > *getJobList()* returns a List representation of the iterator. > > public List<JobStats> getJobList() { > return IteratorUtils.toList(iterator()); > } > > What do you think about making getSuccessfulJobs() and getFailedJobs() > public and exposing it to the API? Currently they are package-private? > > Had another question, seems like the execution flow for > PigServer.registerScript/Query is different from PigServer.store(). Was > there a reason to make these different? The function store() returns an > ExecJob which is great to get info regarding the runs, but registerScript() > calls the GruntParser for execution which I think is a different flow? > > Thanks, > Prashant > > > On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED]> wrote: > >> Makes sense to me. We could return a PigStats object. >> >> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi <[EMAIL PROTECTED] >> >wrote: >> >> > Hi All, >> > >> > I am looking at PigServer methods for running scripts/queries and it >> seems >> > like currently theie return type is void which does not tell much about >> job >> > completion. >> > >> > public void registerScript(InputStream in, Map<String,String> >> > params,List<String> paramsFiles) throws IOException { >> > try { >> > String substituted = doParamSubstitution(in, params, >> > paramsFiles); >> > GruntParser grunt = new GruntParser(new >> > StringReader(substituted)); >> > grunt.setInteractive(false); >> > grunt.setParams(this); >> > grunt.parseStopOnError(true); >> > } catch (org.apache.pig.tools.pigscript.parser.ParseException >> e) { >> > log.error(e.getLocalizedMessage()); >> > throw new IOException(e.getCause()); >> > } >> > } >> > >> > >> > We do have a handle on number of jobs succeeded/failed as part of the >> job >> > run, so that is something we should add as return type? >> > >> > Thanks, >> > Prashant >> > >> >> >> >> -- >> *Note that I'm no longer using my Yahoo! email address. Please email me at >> [EMAIL PROTECTED] going forward.* >> > > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [EMAIL PROTECTED] going forward.*
-
Re: PigServer APIPrashant Kommireddi 2012-10-10, 18:00
Thanks Bill.
The rationale behind providing a List is that it simply provides a lot more methods than an iterator. You are right in saying one could do that in the caller code, I have a feeling providing this helper in the API would be beneficial. For eg, a framework that is used by clients could initiate several pig scripts/store commands at once. At the framework layer, you might want to be able to determine the number of MR jobs in total spawned by these multiple scripts and query stats on those. That's just one use-case, there could be more methods on List that a user could be interested in. -Prashant On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED]> wrote: > Hi Prashant, > > [Replying to the dev list to get others take on these...] > > Just curious, why do you prefer a List of JobStats over the already > existing iterator? I hesitate to add one-liner methods if it's something > that can be a one-liner my the caller, unless the use case if very common. > > Making getSuccessfulJobs() and getFailedJobs() public seems reasonable to > me. > > I'm not sure about the rationale behind the differences between > registerScript and store(). Store() and registerQuery() are able to > manually add to the DAG as statements come in, but register script needs > parsing for execution. That's probably why execution is delegated to the > GruntParser. The resulting DAG for a single-store script should be the same > though. It seems like registerScript() should be able to return a list of > ExecJobs. > > thanks, > Bill > > > On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote: > >> Hi Bill, >> >> I am looking at PigStats and JobGraph, and am thinking of adding some >> functions. Let me know what you think. >> >> *getJobList()* returns a List representation of the iterator. >> >> public List<JobStats> getJobList() { >> return IteratorUtils.toList(iterator()); >> } >> >> What do you think about making getSuccessfulJobs() and getFailedJobs() >> public and exposing it to the API? Currently they are package-private? >> >> Had another question, seems like the execution flow for >> PigServer.registerScript/Query is different from PigServer.store(). Was >> there a reason to make these different? The function store() returns an >> ExecJob which is great to get info regarding the runs, but registerScript() >> calls the GruntParser for execution which I think is a different flow? >> >> Thanks, >> Prashant >> >> >> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED]> wrote: >> >>> Makes sense to me. We could return a PigStats object. >>> >>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi <[EMAIL PROTECTED] >>> >wrote: >>> >>> > Hi All, >>> > >>> > I am looking at PigServer methods for running scripts/queries and it >>> seems >>> > like currently theie return type is void which does not tell much >>> about job >>> > completion. >>> > >>> > public void registerScript(InputStream in, Map<String,String> >>> > params,List<String> paramsFiles) throws IOException { >>> > try { >>> > String substituted = doParamSubstitution(in, params, >>> > paramsFiles); >>> > GruntParser grunt = new GruntParser(new >>> > StringReader(substituted)); >>> > grunt.setInteractive(false); >>> > grunt.setParams(this); >>> > grunt.parseStopOnError(true); >>> > } catch (org.apache.pig.tools.pigscript.parser.ParseException >>> e) { >>> > log.error(e.getLocalizedMessage()); >>> > throw new IOException(e.getCause()); >>> > } >>> > } >>> > >>> > >>> > We do have a handle on number of jobs succeeded/failed as part of the >>> job >>> > run, so that is something we should add as return type? >>> > >>> > Thanks, >>> > Prashant >>> > >>> >>> >>> >>> -- >>> *Note that I'm no longer using my Yahoo! email address. Please email me >>> at >>> [EMAIL PROTECTED] going forward.* >>> >> >> > > > -- > *Note that I'm no longer using my Yahoo! email address. Please email me
-
Re: PigServer APIBill Graham 2012-10-11, 00:54
Ok, I'm sold. :)
On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote: > Thanks Bill. > > The rationale behind providing a List is that it simply provides a lot > more methods than an iterator. You are right in saying one could do that in > the caller code, I have a feeling providing this helper in the API would be > beneficial. For eg, a framework that is used by clients could initiate > several pig scripts/store commands at once. At the framework layer, you > might want to be able to determine the number of MR jobs in total spawned > by these multiple scripts and query stats on those. That's just one > use-case, there could be more methods on List that a user could be > interested in. > > -Prashant > > > On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED]>wrote: > >> Hi Prashant, >> >> [Replying to the dev list to get others take on these...] >> >> Just curious, why do you prefer a List of JobStats over the already >> existing iterator? I hesitate to add one-liner methods if it's something >> that can be a one-liner my the caller, unless the use case if very common. >> >> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable to >> me. >> >> I'm not sure about the rationale behind the differences between >> registerScript and store(). Store() and registerQuery() are able to >> manually add to the DAG as statements come in, but register script needs >> parsing for execution. That's probably why execution is delegated to the >> GruntParser. The resulting DAG for a single-store script should be the same >> though. It seems like registerScript() should be able to return a list of >> ExecJobs. >> >> thanks, >> Bill >> >> >> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi <[EMAIL PROTECTED] >> > wrote: >> >>> Hi Bill, >>> >>> I am looking at PigStats and JobGraph, and am thinking of adding some >>> functions. Let me know what you think. >>> >>> *getJobList()* returns a List representation of the iterator. >>> >>> public List<JobStats> getJobList() { >>> return IteratorUtils.toList(iterator()); >>> } >>> >>> What do you think about making getSuccessfulJobs() and getFailedJobs() >>> public and exposing it to the API? Currently they are package-private? >>> >>> Had another question, seems like the execution flow for >>> PigServer.registerScript/Query is different from PigServer.store(). Was >>> there a reason to make these different? The function store() returns an >>> ExecJob which is great to get info regarding the runs, but registerScript() >>> calls the GruntParser for execution which I think is a different flow? >>> >>> Thanks, >>> Prashant >>> >>> >>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED]>wrote: >>> >>>> Makes sense to me. We could return a PigStats object. >>>> >>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi < >>>> [EMAIL PROTECTED]>wrote: >>>> >>>> > Hi All, >>>> > >>>> > I am looking at PigServer methods for running scripts/queries and it >>>> seems >>>> > like currently theie return type is void which does not tell much >>>> about job >>>> > completion. >>>> > >>>> > public void registerScript(InputStream in, Map<String,String> >>>> > params,List<String> paramsFiles) throws IOException { >>>> > try { >>>> > String substituted = doParamSubstitution(in, params, >>>> > paramsFiles); >>>> > GruntParser grunt = new GruntParser(new >>>> > StringReader(substituted)); >>>> > grunt.setInteractive(false); >>>> > grunt.setParams(this); >>>> > grunt.parseStopOnError(true); >>>> > } catch (org.apache.pig.tools.pigscript.parser.ParseException >>>> e) { >>>> > log.error(e.getLocalizedMessage()); >>>> > throw new IOException(e.getCause()); >>>> > } >>>> > } >>>> > >>>> > >>>> > We do have a handle on number of jobs succeeded/failed as part of the >>>> job >>>> > run, so that is something we should add as return type? *Note that I'm no longer using my Yahoo! email address. Please email me at [EMAIL PROTECTED] going forward.*
-
Re: PigServer APIPrashant Kommireddi 2012-10-11, 09:12
I knew I had those negotiation skills :)
Patch is available, please review. It's a minor one https://issues.apache.org/jira/browse/PIG-2964 -Prashant On Wed, Oct 10, 2012 at 5:54 PM, Bill Graham <[EMAIL PROTECTED]> wrote: > Ok, I'm sold. :) > > > On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi <[EMAIL PROTECTED] > > wrote: > >> Thanks Bill. >> >> The rationale behind providing a List is that it simply provides a lot >> more methods than an iterator. You are right in saying one could do that in >> the caller code, I have a feeling providing this helper in the API would be >> beneficial. For eg, a framework that is used by clients could initiate >> several pig scripts/store commands at once. At the framework layer, you >> might want to be able to determine the number of MR jobs in total spawned >> by these multiple scripts and query stats on those. That's just one >> use-case, there could be more methods on List that a user could be >> interested in. >> >> -Prashant >> >> >> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED]>wrote: >> >>> Hi Prashant, >>> >>> [Replying to the dev list to get others take on these...] >>> >>> Just curious, why do you prefer a List of JobStats over the already >>> existing iterator? I hesitate to add one-liner methods if it's something >>> that can be a one-liner my the caller, unless the use case if very common. >>> >>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable >>> to me. >>> >>> I'm not sure about the rationale behind the differences between >>> registerScript and store(). Store() and registerQuery() are able to >>> manually add to the DAG as statements come in, but register script needs >>> parsing for execution. That's probably why execution is delegated to the >>> GruntParser. The resulting DAG for a single-store script should be the same >>> though. It seems like registerScript() should be able to return a list of >>> ExecJobs. >>> >>> thanks, >>> Bill >>> >>> >>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi < >>> [EMAIL PROTECTED]> wrote: >>> >>>> Hi Bill, >>>> >>>> I am looking at PigStats and JobGraph, and am thinking of adding some >>>> functions. Let me know what you think. >>>> >>>> *getJobList()* returns a List representation of the iterator. >>>> >>>> public List<JobStats> getJobList() { >>>> return IteratorUtils.toList(iterator()); >>>> } >>>> >>>> What do you think about making getSuccessfulJobs() and getFailedJobs() >>>> public and exposing it to the API? Currently they are package-private? >>>> >>>> Had another question, seems like the execution flow for >>>> PigServer.registerScript/Query is different from PigServer.store(). Was >>>> there a reason to make these different? The function store() returns an >>>> ExecJob which is great to get info regarding the runs, but registerScript() >>>> calls the GruntParser for execution which I think is a different flow? >>>> >>>> Thanks, >>>> Prashant >>>> >>>> >>>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED]>wrote: >>>> >>>>> Makes sense to me. We could return a PigStats object. >>>>> >>>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi < >>>>> [EMAIL PROTECTED]>wrote: >>>>> >>>>> > Hi All, >>>>> > >>>>> > I am looking at PigServer methods for running scripts/queries and it >>>>> seems >>>>> > like currently theie return type is void which does not tell much >>>>> about job >>>>> > completion. >>>>> > >>>>> > public void registerScript(InputStream in, Map<String,String> >>>>> > params,List<String> paramsFiles) throws IOException { >>>>> > try { >>>>> > String substituted = doParamSubstitution(in, params, >>>>> > paramsFiles); >>>>> > GruntParser grunt = new GruntParser(new >>>>> > StringReader(substituted)); >>>>> > grunt.setInteractive(false); >>>>> > grunt.setParams(this); >>>>> > grunt.parseStopOnError(true); >>>>> > } catch >>>>> (org.apache.pig.tools.pigscript.parser.ParseException e) {
-
Re: PigServer APIDmitriy Ryaboy 2012-10-11, 19:27
Doesn't executeBatch() return exactly what you want?
On Thu, Oct 11, 2012 at 2:12 AM, Prashant Kommireddi <[EMAIL PROTECTED]> wrote: > I knew I had those negotiation skills :) > > Patch is available, please review. It's a minor one > https://issues.apache.org/jira/browse/PIG-2964 > > -Prashant > > On Wed, Oct 10, 2012 at 5:54 PM, Bill Graham <[EMAIL PROTECTED]> wrote: > >> Ok, I'm sold. :) >> >> >> On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi <[EMAIL PROTECTED] >> > wrote: >> >>> Thanks Bill. >>> >>> The rationale behind providing a List is that it simply provides a lot >>> more methods than an iterator. You are right in saying one could do that in >>> the caller code, I have a feeling providing this helper in the API would be >>> beneficial. For eg, a framework that is used by clients could initiate >>> several pig scripts/store commands at once. At the framework layer, you >>> might want to be able to determine the number of MR jobs in total spawned >>> by these multiple scripts and query stats on those. That's just one >>> use-case, there could be more methods on List that a user could be >>> interested in. >>> >>> -Prashant >>> >>> >>> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED]>wrote: >>> >>>> Hi Prashant, >>>> >>>> [Replying to the dev list to get others take on these...] >>>> >>>> Just curious, why do you prefer a List of JobStats over the already >>>> existing iterator? I hesitate to add one-liner methods if it's something >>>> that can be a one-liner my the caller, unless the use case if very common. >>>> >>>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable >>>> to me. >>>> >>>> I'm not sure about the rationale behind the differences between >>>> registerScript and store(). Store() and registerQuery() are able to >>>> manually add to the DAG as statements come in, but register script needs >>>> parsing for execution. That's probably why execution is delegated to the >>>> GruntParser. The resulting DAG for a single-store script should be the same >>>> though. It seems like registerScript() should be able to return a list of >>>> ExecJobs. >>>> >>>> thanks, >>>> Bill >>>> >>>> >>>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi < >>>> [EMAIL PROTECTED]> wrote: >>>> >>>>> Hi Bill, >>>>> >>>>> I am looking at PigStats and JobGraph, and am thinking of adding some >>>>> functions. Let me know what you think. >>>>> >>>>> *getJobList()* returns a List representation of the iterator. >>>>> >>>>> public List<JobStats> getJobList() { >>>>> return IteratorUtils.toList(iterator()); >>>>> } >>>>> >>>>> What do you think about making getSuccessfulJobs() and getFailedJobs() >>>>> public and exposing it to the API? Currently they are package-private? >>>>> >>>>> Had another question, seems like the execution flow for >>>>> PigServer.registerScript/Query is different from PigServer.store(). Was >>>>> there a reason to make these different? The function store() returns an >>>>> ExecJob which is great to get info regarding the runs, but registerScript() >>>>> calls the GruntParser for execution which I think is a different flow? >>>>> >>>>> Thanks, >>>>> Prashant >>>>> >>>>> >>>>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED]>wrote: >>>>> >>>>>> Makes sense to me. We could return a PigStats object. >>>>>> >>>>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi < >>>>>> [EMAIL PROTECTED]>wrote: >>>>>> >>>>>> > Hi All, >>>>>> > >>>>>> > I am looking at PigServer methods for running scripts/queries and it >>>>>> seems >>>>>> > like currently theie return type is void which does not tell much >>>>>> about job >>>>>> > completion. >>>>>> > >>>>>> > public void registerScript(InputStream in, Map<String,String> >>>>>> > params,List<String> paramsFiles) throws IOException { >>>>>> > try { >>>>>> > String substituted = doParamSubstitution(in, params, >>>>>> > paramsFiles); >>>>>> > GruntParser grunt = new GruntParser(new
-
Re: PigServer APIPrashant Kommireddi 2012-10-11, 19:54
True, that does what would serve the purpose. However, I feel the
abstraction could be at a lower level so callers of other functions such as "store" could use it too. On Thu, Oct 11, 2012 at 12:27 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Doesn't executeBatch() return exactly what you want? > > > > On Thu, Oct 11, 2012 at 2:12 AM, Prashant Kommireddi > <[EMAIL PROTECTED]> wrote: > > I knew I had those negotiation skills :) > > > > Patch is available, please review. It's a minor one > > https://issues.apache.org/jira/browse/PIG-2964 > > > > -Prashant > > > > On Wed, Oct 10, 2012 at 5:54 PM, Bill Graham <[EMAIL PROTECTED]> > wrote: > > > >> Ok, I'm sold. :) > >> > >> > >> On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi < > [EMAIL PROTECTED] > >> > wrote: > >> > >>> Thanks Bill. > >>> > >>> The rationale behind providing a List is that it simply provides a lot > >>> more methods than an iterator. You are right in saying one could do > that in > >>> the caller code, I have a feeling providing this helper in the API > would be > >>> beneficial. For eg, a framework that is used by clients could initiate > >>> several pig scripts/store commands at once. At the framework layer, you > >>> might want to be able to determine the number of MR jobs in total > spawned > >>> by these multiple scripts and query stats on those. That's just one > >>> use-case, there could be more methods on List that a user could be > >>> interested in. > >>> > >>> -Prashant > >>> > >>> > >>> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED] > >wrote: > >>> > >>>> Hi Prashant, > >>>> > >>>> [Replying to the dev list to get others take on these...] > >>>> > >>>> Just curious, why do you prefer a List of JobStats over the already > >>>> existing iterator? I hesitate to add one-liner methods if it's > something > >>>> that can be a one-liner my the caller, unless the use case if very > common. > >>>> > >>>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable > >>>> to me. > >>>> > >>>> I'm not sure about the rationale behind the differences between > >>>> registerScript and store(). Store() and registerQuery() are able to > >>>> manually add to the DAG as statements come in, but register script > needs > >>>> parsing for execution. That's probably why execution is delegated to > the > >>>> GruntParser. The resulting DAG for a single-store script should be > the same > >>>> though. It seems like registerScript() should be able to return a > list of > >>>> ExecJobs. > >>>> > >>>> thanks, > >>>> Bill > >>>> > >>>> > >>>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi < > >>>> [EMAIL PROTECTED]> wrote: > >>>> > >>>>> Hi Bill, > >>>>> > >>>>> I am looking at PigStats and JobGraph, and am thinking of adding some > >>>>> functions. Let me know what you think. > >>>>> > >>>>> *getJobList()* returns a List representation of the iterator. > >>>>> > >>>>> public List<JobStats> getJobList() { > >>>>> return IteratorUtils.toList(iterator()); > >>>>> } > >>>>> > >>>>> What do you think about making getSuccessfulJobs() and > getFailedJobs() > >>>>> public and exposing it to the API? Currently they are > package-private? > >>>>> > >>>>> Had another question, seems like the execution flow for > >>>>> PigServer.registerScript/Query is different from PigServer.store(). > Was > >>>>> there a reason to make these different? The function store() returns > an > >>>>> ExecJob which is great to get info regarding the runs, but > registerScript() > >>>>> calls the GruntParser for execution which I think is a different > flow? > >>>>> > >>>>> Thanks, > >>>>> Prashant > >>>>> > >>>>> > >>>>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED] > >wrote: > >>>>> > >>>>>> Makes sense to me. We could return a PigStats object. > >>>>>> > >>>>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi < > >>>>>> [EMAIL PROTECTED]>wrote: > >>>>>> > >>>>>> > Hi All, > >>>>>> > > >>>>>> > I am looking at PigServer methods for running scripts/queries and
-
Re: PigServer APIDmitriy Ryaboy 2012-10-11, 21:28
Backwards compatibility is an issue..
On Thu, Oct 11, 2012 at 12:54 PM, Prashant Kommireddi <[EMAIL PROTECTED]> wrote: > True, that does what would serve the purpose. However, I feel the > abstraction could be at a lower level so callers of other functions such as > "store" could use it too. > > On Thu, Oct 11, 2012 at 12:27 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > >> Doesn't executeBatch() return exactly what you want? >> >> >> >> On Thu, Oct 11, 2012 at 2:12 AM, Prashant Kommireddi >> <[EMAIL PROTECTED]> wrote: >> > I knew I had those negotiation skills :) >> > >> > Patch is available, please review. It's a minor one >> > https://issues.apache.org/jira/browse/PIG-2964 >> > >> > -Prashant >> > >> > On Wed, Oct 10, 2012 at 5:54 PM, Bill Graham <[EMAIL PROTECTED]> >> wrote: >> > >> >> Ok, I'm sold. :) >> >> >> >> >> >> On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi < >> [EMAIL PROTECTED] >> >> > wrote: >> >> >> >>> Thanks Bill. >> >>> >> >>> The rationale behind providing a List is that it simply provides a lot >> >>> more methods than an iterator. You are right in saying one could do >> that in >> >>> the caller code, I have a feeling providing this helper in the API >> would be >> >>> beneficial. For eg, a framework that is used by clients could initiate >> >>> several pig scripts/store commands at once. At the framework layer, you >> >>> might want to be able to determine the number of MR jobs in total >> spawned >> >>> by these multiple scripts and query stats on those. That's just one >> >>> use-case, there could be more methods on List that a user could be >> >>> interested in. >> >>> >> >>> -Prashant >> >>> >> >>> >> >>> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED] >> >wrote: >> >>> >> >>>> Hi Prashant, >> >>>> >> >>>> [Replying to the dev list to get others take on these...] >> >>>> >> >>>> Just curious, why do you prefer a List of JobStats over the already >> >>>> existing iterator? I hesitate to add one-liner methods if it's >> something >> >>>> that can be a one-liner my the caller, unless the use case if very >> common. >> >>>> >> >>>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable >> >>>> to me. >> >>>> >> >>>> I'm not sure about the rationale behind the differences between >> >>>> registerScript and store(). Store() and registerQuery() are able to >> >>>> manually add to the DAG as statements come in, but register script >> needs >> >>>> parsing for execution. That's probably why execution is delegated to >> the >> >>>> GruntParser. The resulting DAG for a single-store script should be >> the same >> >>>> though. It seems like registerScript() should be able to return a >> list of >> >>>> ExecJobs. >> >>>> >> >>>> thanks, >> >>>> Bill >> >>>> >> >>>> >> >>>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi < >> >>>> [EMAIL PROTECTED]> wrote: >> >>>> >> >>>>> Hi Bill, >> >>>>> >> >>>>> I am looking at PigStats and JobGraph, and am thinking of adding some >> >>>>> functions. Let me know what you think. >> >>>>> >> >>>>> *getJobList()* returns a List representation of the iterator. >> >>>>> >> >>>>> public List<JobStats> getJobList() { >> >>>>> return IteratorUtils.toList(iterator()); >> >>>>> } >> >>>>> >> >>>>> What do you think about making getSuccessfulJobs() and >> getFailedJobs() >> >>>>> public and exposing it to the API? Currently they are >> package-private? >> >>>>> >> >>>>> Had another question, seems like the execution flow for >> >>>>> PigServer.registerScript/Query is different from PigServer.store(). >> Was >> >>>>> there a reason to make these different? The function store() returns >> an >> >>>>> ExecJob which is great to get info regarding the runs, but >> registerScript() >> >>>>> calls the GruntParser for execution which I think is a different >> flow? >> >>>>> >> >>>>> Thanks, >> >>>>> Prashant >> >>>>> >> >>>>> >> >>>>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED] >> >wrote: >> >>>>> >> >>>>>> Makes sense to me. We could return a PigStats object.
-
Re: PigServer APIDmitriy Ryaboy 2012-10-11, 21:28
actually, no it's not, if all we are changing is return type from void
to something better. carry on. On Thu, Oct 11, 2012 at 2:28 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > Backwards compatibility is an issue.. > > On Thu, Oct 11, 2012 at 12:54 PM, Prashant Kommireddi > <[EMAIL PROTECTED]> wrote: >> True, that does what would serve the purpose. However, I feel the >> abstraction could be at a lower level so callers of other functions such as >> "store" could use it too. >> >> On Thu, Oct 11, 2012 at 12:27 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: >> >>> Doesn't executeBatch() return exactly what you want? >>> >>> >>> >>> On Thu, Oct 11, 2012 at 2:12 AM, Prashant Kommireddi >>> <[EMAIL PROTECTED]> wrote: >>> > I knew I had those negotiation skills :) >>> > >>> > Patch is available, please review. It's a minor one >>> > https://issues.apache.org/jira/browse/PIG-2964 >>> > >>> > -Prashant >>> > >>> > On Wed, Oct 10, 2012 at 5:54 PM, Bill Graham <[EMAIL PROTECTED]> >>> wrote: >>> > >>> >> Ok, I'm sold. :) >>> >> >>> >> >>> >> On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi < >>> [EMAIL PROTECTED] >>> >> > wrote: >>> >> >>> >>> Thanks Bill. >>> >>> >>> >>> The rationale behind providing a List is that it simply provides a lot >>> >>> more methods than an iterator. You are right in saying one could do >>> that in >>> >>> the caller code, I have a feeling providing this helper in the API >>> would be >>> >>> beneficial. For eg, a framework that is used by clients could initiate >>> >>> several pig scripts/store commands at once. At the framework layer, you >>> >>> might want to be able to determine the number of MR jobs in total >>> spawned >>> >>> by these multiple scripts and query stats on those. That's just one >>> >>> use-case, there could be more methods on List that a user could be >>> >>> interested in. >>> >>> >>> >>> -Prashant >>> >>> >>> >>> >>> >>> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED] >>> >wrote: >>> >>> >>> >>>> Hi Prashant, >>> >>>> >>> >>>> [Replying to the dev list to get others take on these...] >>> >>>> >>> >>>> Just curious, why do you prefer a List of JobStats over the already >>> >>>> existing iterator? I hesitate to add one-liner methods if it's >>> something >>> >>>> that can be a one-liner my the caller, unless the use case if very >>> common. >>> >>>> >>> >>>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable >>> >>>> to me. >>> >>>> >>> >>>> I'm not sure about the rationale behind the differences between >>> >>>> registerScript and store(). Store() and registerQuery() are able to >>> >>>> manually add to the DAG as statements come in, but register script >>> needs >>> >>>> parsing for execution. That's probably why execution is delegated to >>> the >>> >>>> GruntParser. The resulting DAG for a single-store script should be >>> the same >>> >>>> though. It seems like registerScript() should be able to return a >>> list of >>> >>>> ExecJobs. >>> >>>> >>> >>>> thanks, >>> >>>> Bill >>> >>>> >>> >>>> >>> >>>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi < >>> >>>> [EMAIL PROTECTED]> wrote: >>> >>>> >>> >>>>> Hi Bill, >>> >>>>> >>> >>>>> I am looking at PigStats and JobGraph, and am thinking of adding some >>> >>>>> functions. Let me know what you think. >>> >>>>> >>> >>>>> *getJobList()* returns a List representation of the iterator. >>> >>>>> >>> >>>>> public List<JobStats> getJobList() { >>> >>>>> return IteratorUtils.toList(iterator()); >>> >>>>> } >>> >>>>> >>> >>>>> What do you think about making getSuccessfulJobs() and >>> getFailedJobs() >>> >>>>> public and exposing it to the API? Currently they are >>> package-private? >>> >>>>> >>> >>>>> Had another question, seems like the execution flow for >>> >>>>> PigServer.registerScript/Query is different from PigServer.store(). >>> Was >>> >>>>> there a reason to make these different? The function store() returns >>> an >>> >>>>> ExecJob which is great to get info regarding the runs, but |