Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: PigServer API


I knew I had those negotiation skills :)

Patch is available, please review. It's a minor one
https://issues.apache.org/jira/browse/PIG-2964

-Prashant

On Wed, Oct 10, 2012 at 5:54 PM, Bill Graham <[EMAIL PROTECTED]> wrote:

> Ok, I'm sold. :)
>
>
> On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi <[EMAIL PROTECTED]
> > wrote:
>
>> Thanks Bill.
>>
>> The rationale behind providing a List is that it simply provides a lot
>> more methods than an iterator. You are right in saying one could do that in
>> the caller code, I have a feeling providing this helper in the API would be
>> beneficial. For eg, a framework that is used by clients could initiate
>> several pig scripts/store commands at once. At the framework layer, you
>> might want to be able to determine the number of MR jobs in total spawned
>> by these multiple scripts and query stats on those. That's just one
>> use-case, there could be more methods on List that a user could be
>> interested in.
>>
>> -Prashant
>>
>>
>> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED]>wrote:
>>
>>> Hi Prashant,
>>>
>>> [Replying to the dev list to get others take on these...]
>>>
>>> Just curious, why do you prefer a List of JobStats over the already
>>> existing iterator? I hesitate to add one-liner methods if it's something
>>> that can be a one-liner my the caller, unless the use case if very common.
>>>
>>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable
>>> to me.
>>>
>>> I'm not sure about the rationale behind the differences between
>>> registerScript and store(). Store() and registerQuery() are able to
>>> manually add to the DAG as statements come in, but register script needs
>>> parsing for execution. That's probably why execution is delegated to the
>>> GruntParser. The resulting DAG for a single-store script should be the same
>>> though. It seems like registerScript() should be able to return a list of
>>> ExecJobs.
>>>
>>> thanks,
>>> Bill
>>>
>>>
>>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Hi Bill,
>>>>
>>>> I am looking at PigStats and JobGraph, and am thinking of adding some
>>>> functions. Let me know what you think.
>>>>
>>>> *getJobList()* returns a List representation of the iterator.
>>>>
>>>> public List<JobStats> getJobList() {
>>>>             return IteratorUtils.toList(iterator());
>>>> }
>>>>
>>>> What do you think about making getSuccessfulJobs() and getFailedJobs()
>>>> public and exposing it to the API? Currently they are package-private?
>>>>
>>>> Had another question, seems like the execution flow for
>>>> PigServer.registerScript/Query is different from PigServer.store(). Was
>>>> there a reason to make these different? The function store() returns an
>>>> ExecJob which is great to get info regarding the runs, but registerScript()
>>>> calls the GruntParser for execution which I think is a different flow?
>>>>
>>>> Thanks,
>>>> Prashant
>>>>
>>>>
>>>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Makes sense to me. We could return a PigStats object.
>>>>>
>>>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi <
>>>>> [EMAIL PROTECTED]>wrote:
>>>>>
>>>>> > Hi All,
>>>>> >
>>>>> > I am looking at PigServer methods for running scripts/queries and it
>>>>> seems
>>>>> > like currently theie return type is void which does not tell much
>>>>> about job
>>>>> > completion.
>>>>> >
>>>>> >     public void registerScript(InputStream in, Map<String,String>
>>>>> > params,List<String> paramsFiles) throws IOException {
>>>>> >         try {
>>>>> >             String substituted = doParamSubstitution(in, params,
>>>>> > paramsFiles);
>>>>> >             GruntParser grunt = new GruntParser(new
>>>>> > StringReader(substituted));
>>>>> >             grunt.setInteractive(false);
>>>>> >             grunt.setParams(this);
>>>>> >             grunt.parseStopOnError(true);
>>>>> >         } catch
>>>>> (org.apache.pig.tools.pigscript.parser.ParseException e) {