Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: PigServer API


Copy link to this message
-
Re: PigServer API
Ok, I'm sold. :)

On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi
<[EMAIL PROTECTED]>wrote:

> Thanks Bill.
>
> The rationale behind providing a List is that it simply provides a lot
> more methods than an iterator. You are right in saying one could do that in
> the caller code, I have a feeling providing this helper in the API would be
> beneficial. For eg, a framework that is used by clients could initiate
> several pig scripts/store commands at once. At the framework layer, you
> might want to be able to determine the number of MR jobs in total spawned
> by these multiple scripts and query stats on those. That's just one
> use-case, there could be more methods on List that a user could be
> interested in.
>
> -Prashant
>
>
> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED]>wrote:
>
>> Hi Prashant,
>>
>> [Replying to the dev list to get others take on these...]
>>
>> Just curious, why do you prefer a List of JobStats over the already
>> existing iterator? I hesitate to add one-liner methods if it's something
>> that can be a one-liner my the caller, unless the use case if very common.
>>
>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable to
>> me.
>>
>> I'm not sure about the rationale behind the differences between
>> registerScript and store(). Store() and registerQuery() are able to
>> manually add to the DAG as statements come in, but register script needs
>> parsing for execution. That's probably why execution is delegated to the
>> GruntParser. The resulting DAG for a single-store script should be the same
>> though. It seems like registerScript() should be able to return a list of
>> ExecJobs.
>>
>> thanks,
>> Bill
>>
>>
>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi <[EMAIL PROTECTED]
>> > wrote:
>>
>>> Hi Bill,
>>>
>>> I am looking at PigStats and JobGraph, and am thinking of adding some
>>> functions. Let me know what you think.
>>>
>>> *getJobList()* returns a List representation of the iterator.
>>>
>>> public List<JobStats> getJobList() {
>>>             return IteratorUtils.toList(iterator());
>>> }
>>>
>>> What do you think about making getSuccessfulJobs() and getFailedJobs()
>>> public and exposing it to the API? Currently they are package-private?
>>>
>>> Had another question, seems like the execution flow for
>>> PigServer.registerScript/Query is different from PigServer.store(). Was
>>> there a reason to make these different? The function store() returns an
>>> ExecJob which is great to get info regarding the runs, but registerScript()
>>> calls the GruntParser for execution which I think is a different flow?
>>>
>>> Thanks,
>>> Prashant
>>>
>>>
>>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED]>wrote:
>>>
>>>> Makes sense to me. We could return a PigStats object.
>>>>
>>>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi <
>>>> [EMAIL PROTECTED]>wrote:
>>>>
>>>> > Hi All,
>>>> >
>>>> > I am looking at PigServer methods for running scripts/queries and it
>>>> seems
>>>> > like currently theie return type is void which does not tell much
>>>> about job
>>>> > completion.
>>>> >
>>>> >     public void registerScript(InputStream in, Map<String,String>
>>>> > params,List<String> paramsFiles) throws IOException {
>>>> >         try {
>>>> >             String substituted = doParamSubstitution(in, params,
>>>> > paramsFiles);
>>>> >             GruntParser grunt = new GruntParser(new
>>>> > StringReader(substituted));
>>>> >             grunt.setInteractive(false);
>>>> >             grunt.setParams(this);
>>>> >             grunt.parseStopOnError(true);
>>>> >         } catch (org.apache.pig.tools.pigscript.parser.ParseException
>>>> e) {
>>>> >             log.error(e.getLocalizedMessage());
>>>> >             throw new IOException(e.getCause());
>>>> >         }
>>>> >     }
>>>> >
>>>> >
>>>> > We do have a handle on number of jobs succeeded/failed as part of the
>>>> job
>>>> > run, so that is something we should add as return type?
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*