Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Re: PigServer API


Copy link to this message
-
Re: PigServer API
Prashant Kommireddi 2012-10-10, 18:00
Thanks Bill.

The rationale behind providing a List is that it simply provides a lot more
methods than an iterator. You are right in saying one could do that in the
caller code, I have a feeling providing this helper in the API would be
beneficial. For eg, a framework that is used by clients could initiate
several pig scripts/store commands at once. At the framework layer, you
might want to be able to determine the number of MR jobs in total spawned
by these multiple scripts and query stats on those. That's just one
use-case, there could be more methods on List that a user could be
interested in.

-Prashant

On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED]> wrote:

> Hi Prashant,
>
> [Replying to the dev list to get others take on these...]
>
> Just curious, why do you prefer a List of JobStats over the already
> existing iterator? I hesitate to add one-liner methods if it's something
> that can be a one-liner my the caller, unless the use case if very common.
>
> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable to
> me.
>
> I'm not sure about the rationale behind the differences between
> registerScript and store(). Store() and registerQuery() are able to
> manually add to the DAG as statements come in, but register script needs
> parsing for execution. That's probably why execution is delegated to the
> GruntParser. The resulting DAG for a single-store script should be the same
> though. It seems like registerScript() should be able to return a list of
> ExecJobs.
>
> thanks,
> Bill
>
>
> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
>
>> Hi Bill,
>>
>> I am looking at PigStats and JobGraph, and am thinking of adding some
>> functions. Let me know what you think.
>>
>> *getJobList()* returns a List representation of the iterator.
>>
>> public List<JobStats> getJobList() {
>>             return IteratorUtils.toList(iterator());
>> }
>>
>> What do you think about making getSuccessfulJobs() and getFailedJobs()
>> public and exposing it to the API? Currently they are package-private?
>>
>> Had another question, seems like the execution flow for
>> PigServer.registerScript/Query is different from PigServer.store(). Was
>> there a reason to make these different? The function store() returns an
>> ExecJob which is great to get info regarding the runs, but registerScript()
>> calls the GruntParser for execution which I think is a different flow?
>>
>> Thanks,
>> Prashant
>>
>>
>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
>>
>>> Makes sense to me. We could return a PigStats object.
>>>
>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi <[EMAIL PROTECTED]
>>> >wrote:
>>>
>>> > Hi All,
>>> >
>>> > I am looking at PigServer methods for running scripts/queries and it
>>> seems
>>> > like currently theie return type is void which does not tell much
>>> about job
>>> > completion.
>>> >
>>> >     public void registerScript(InputStream in, Map<String,String>
>>> > params,List<String> paramsFiles) throws IOException {
>>> >         try {
>>> >             String substituted = doParamSubstitution(in, params,
>>> > paramsFiles);
>>> >             GruntParser grunt = new GruntParser(new
>>> > StringReader(substituted));
>>> >             grunt.setInteractive(false);
>>> >             grunt.setParams(this);
>>> >             grunt.parseStopOnError(true);
>>> >         } catch (org.apache.pig.tools.pigscript.parser.ParseException
>>> e) {
>>> >             log.error(e.getLocalizedMessage());
>>> >             throw new IOException(e.getCause());
>>> >         }
>>> >     }
>>> >
>>> >
>>> > We do have a handle on number of jobs succeeded/failed as part of the
>>> job
>>> > run, so that is something we should add as return type?
>>> >
>>> > Thanks,
>>> > Prashant
>>> >
>>>
>>>
>>>
>>> --
>>> *Note that I'm no longer using my Yahoo! email address. Please email me
>>> at
>>> [EMAIL PROTECTED] going forward.*
>>>
>>
>>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me