Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: PigServer API


Copy link to this message
-
Re: PigServer API
Backwards compatibility is an issue..

On Thu, Oct 11, 2012 at 12:54 PM, Prashant Kommireddi
<[EMAIL PROTECTED]> wrote:
> True, that does what would serve the purpose. However, I feel the
> abstraction could be at a lower level so callers of other functions such as
> "store" could use it too.
>
> On Thu, Oct 11, 2012 at 12:27 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>
>> Doesn't executeBatch() return exactly what you want?
>>
>>
>>
>> On Thu, Oct 11, 2012 at 2:12 AM, Prashant Kommireddi
>> <[EMAIL PROTECTED]> wrote:
>> > I knew I had those negotiation skills :)
>> >
>> > Patch is available, please review. It's a minor one
>> > https://issues.apache.org/jira/browse/PIG-2964
>> >
>> > -Prashant
>> >
>> > On Wed, Oct 10, 2012 at 5:54 PM, Bill Graham <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >> Ok, I'm sold. :)
>> >>
>> >>
>> >> On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi <
>> [EMAIL PROTECTED]
>> >> > wrote:
>> >>
>> >>> Thanks Bill.
>> >>>
>> >>> The rationale behind providing a List is that it simply provides a lot
>> >>> more methods than an iterator. You are right in saying one could do
>> that in
>> >>> the caller code, I have a feeling providing this helper in the API
>> would be
>> >>> beneficial. For eg, a framework that is used by clients could initiate
>> >>> several pig scripts/store commands at once. At the framework layer, you
>> >>> might want to be able to determine the number of MR jobs in total
>> spawned
>> >>> by these multiple scripts and query stats on those. That's just one
>> >>> use-case, there could be more methods on List that a user could be
>> >>> interested in.
>> >>>
>> >>> -Prashant
>> >>>
>> >>>
>> >>> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED]
>> >wrote:
>> >>>
>> >>>> Hi Prashant,
>> >>>>
>> >>>> [Replying to the dev list to get others take on these...]
>> >>>>
>> >>>> Just curious, why do you prefer a List of JobStats over the already
>> >>>> existing iterator? I hesitate to add one-liner methods if it's
>> something
>> >>>> that can be a one-liner my the caller, unless the use case if very
>> common.
>> >>>>
>> >>>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable
>> >>>> to me.
>> >>>>
>> >>>> I'm not sure about the rationale behind the differences between
>> >>>> registerScript and store(). Store() and registerQuery() are able to
>> >>>> manually add to the DAG as statements come in, but register script
>> needs
>> >>>> parsing for execution. That's probably why execution is delegated to
>> the
>> >>>> GruntParser. The resulting DAG for a single-store script should be
>> the same
>> >>>> though. It seems like registerScript() should be able to return a
>> list of
>> >>>> ExecJobs.
>> >>>>
>> >>>> thanks,
>> >>>> Bill
>> >>>>
>> >>>>
>> >>>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi <
>> >>>> [EMAIL PROTECTED]> wrote:
>> >>>>
>> >>>>> Hi Bill,
>> >>>>>
>> >>>>> I am looking at PigStats and JobGraph, and am thinking of adding some
>> >>>>> functions. Let me know what you think.
>> >>>>>
>> >>>>> *getJobList()* returns a List representation of the iterator.
>> >>>>>
>> >>>>> public List<JobStats> getJobList() {
>> >>>>>             return IteratorUtils.toList(iterator());
>> >>>>> }
>> >>>>>
>> >>>>> What do you think about making getSuccessfulJobs() and
>> getFailedJobs()
>> >>>>> public and exposing it to the API? Currently they are
>> package-private?
>> >>>>>
>> >>>>> Had another question, seems like the execution flow for
>> >>>>> PigServer.registerScript/Query is different from PigServer.store().
>> Was
>> >>>>> there a reason to make these different? The function store() returns
>> an
>> >>>>> ExecJob which is great to get info regarding the runs, but
>> registerScript()
>> >>>>> calls the GruntParser for execution which I think is a different
>> flow?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Prashant
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Oct 4, 2012 at 6:05 PM, Bill Graham <[EMAIL PROTECTED]
>> >wrote:
>> >>>>>
>> >>>>>> Makes sense to me. We could return a PigStats object.