Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Re: PigServer API


Copy link to this message
-
Re: PigServer API
Hi Prashant,

[Replying to the dev list to get others take on these...]

Just curious, why do you prefer a List of JobStats over the already
existing iterator? I hesitate to add one-liner methods if it's something
that can be a one-liner my the caller, unless the use case if very common.

Making getSuccessfulJobs() and getFailedJobs() public seems reasonable to
me.

I'm not sure about the rationale behind the differences between
registerScript and store(). Store() and registerQuery() are able to
manually add to the DAG as statements come in, but register script needs
parsing for execution. That's probably why execution is delegated to the
GruntParser. The resulting DAG for a single-store script should be the same
though. It seems like registerScript() should be able to return a list of
ExecJobs.

thanks,
Bill

On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:

> Hi Bill,
>
> I am looking at PigStats and JobGraph, and am thinking of adding some
> functions. Let me know what you think.
>
> *getJobList()* returns a List representation of the iterator.
>
> public List<JobStats> getJobList() {
>             return IteratorUtils.toList(iterator());
> }
>
> What do you think about making getSuccessfulJobs() and getFailedJobs()
> public and exposing it to the API? Currently they are package-private?
>
> Had another question, seems like the execution flow for
> PigServer.registerScript/Query is different from PigServer.store(). Was
> there a reason to make these different? The function store() returns an
> ExecJob which is great to get info regarding the runs, but registerScript()
> calls the GruntParser for execution which I think is a different flow?
>
> Thanks,
> Prashant
>
>
> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
>
>> Makes sense to me. We could return a PigStats object.
>>
>> On Thu, Oct 4, 2012 at 1:49 PM, Prashant Kommireddi <[EMAIL PROTECTED]
>> >wrote:
>>
>> > Hi All,
>> >
>> > I am looking at PigServer methods for running scripts/queries and it
>> seems
>> > like currently theie return type is void which does not tell much about
>> job
>> > completion.
>> >
>> >     public void registerScript(InputStream in, Map<String,String>
>> > params,List<String> paramsFiles) throws IOException {
>> >         try {
>> >             String substituted = doParamSubstitution(in, params,
>> > paramsFiles);
>> >             GruntParser grunt = new GruntParser(new
>> > StringReader(substituted));
>> >             grunt.setInteractive(false);
>> >             grunt.setParams(this);
>> >             grunt.parseStopOnError(true);
>> >         } catch (org.apache.pig.tools.pigscript.parser.ParseException
>> e) {
>> >             log.error(e.getLocalizedMessage());
>> >             throw new IOException(e.getCause());
>> >         }
>> >     }
>> >
>> >
>> > We do have a handle on number of jobs succeeded/failed as part of the
>> job
>> > run, so that is something we should add as return type?
>> >
>> > Thanks,
>> > Prashant
>> >
>>
>>
>>
>> --
>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>> [EMAIL PROTECTED] going forward.*
>>
>
>
--
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*
+
Prashant Kommireddi 2012-10-10, 18:00
+
Bill Graham 2012-10-11, 00:54
+
Prashant Kommireddi 2012-10-11, 09:12
+
Dmitriy Ryaboy 2012-10-11, 19:27
+
Prashant Kommireddi 2012-10-11, 19:54
+
Dmitriy Ryaboy 2012-10-11, 21:28
+
Dmitriy Ryaboy 2012-10-11, 21:28