Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: PigServer API


Copy link to this message
-
Re: PigServer API
actually, no it's not, if all we are changing is return type from void
to something better. carry on.

On Thu, Oct 11, 2012 at 2:28 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Backwards compatibility is an issue..
>
> On Thu, Oct 11, 2012 at 12:54 PM, Prashant Kommireddi
> <[EMAIL PROTECTED]> wrote:
>> True, that does what would serve the purpose. However, I feel the
>> abstraction could be at a lower level so callers of other functions such as
>> "store" could use it too.
>>
>> On Thu, Oct 11, 2012 at 12:27 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>>
>>> Doesn't executeBatch() return exactly what you want?
>>>
>>>
>>>
>>> On Thu, Oct 11, 2012 at 2:12 AM, Prashant Kommireddi
>>> <[EMAIL PROTECTED]> wrote:
>>> > I knew I had those negotiation skills :)
>>> >
>>> > Patch is available, please review. It's a minor one
>>> > https://issues.apache.org/jira/browse/PIG-2964
>>> >
>>> > -Prashant
>>> >
>>> > On Wed, Oct 10, 2012 at 5:54 PM, Bill Graham <[EMAIL PROTECTED]>
>>> wrote:
>>> >
>>> >> Ok, I'm sold. :)
>>> >>
>>> >>
>>> >> On Wed, Oct 10, 2012 at 11:00 AM, Prashant Kommireddi <
>>> [EMAIL PROTECTED]
>>> >> > wrote:
>>> >>
>>> >>> Thanks Bill.
>>> >>>
>>> >>> The rationale behind providing a List is that it simply provides a lot
>>> >>> more methods than an iterator. You are right in saying one could do
>>> that in
>>> >>> the caller code, I have a feeling providing this helper in the API
>>> would be
>>> >>> beneficial. For eg, a framework that is used by clients could initiate
>>> >>> several pig scripts/store commands at once. At the framework layer, you
>>> >>> might want to be able to determine the number of MR jobs in total
>>> spawned
>>> >>> by these multiple scripts and query stats on those. That's just one
>>> >>> use-case, there could be more methods on List that a user could be
>>> >>> interested in.
>>> >>>
>>> >>> -Prashant
>>> >>>
>>> >>>
>>> >>> On Wed, Oct 10, 2012 at 10:28 AM, Bill Graham <[EMAIL PROTECTED]
>>> >wrote:
>>> >>>
>>> >>>> Hi Prashant,
>>> >>>>
>>> >>>> [Replying to the dev list to get others take on these...]
>>> >>>>
>>> >>>> Just curious, why do you prefer a List of JobStats over the already
>>> >>>> existing iterator? I hesitate to add one-liner methods if it's
>>> something
>>> >>>> that can be a one-liner my the caller, unless the use case if very
>>> common.
>>> >>>>
>>> >>>> Making getSuccessfulJobs() and getFailedJobs() public seems reasonable
>>> >>>> to me.
>>> >>>>
>>> >>>> I'm not sure about the rationale behind the differences between
>>> >>>> registerScript and store(). Store() and registerQuery() are able to
>>> >>>> manually add to the DAG as statements come in, but register script
>>> needs
>>> >>>> parsing for execution. That's probably why execution is delegated to
>>> the
>>> >>>> GruntParser. The resulting DAG for a single-store script should be
>>> the same
>>> >>>> though. It seems like registerScript() should be able to return a
>>> list of
>>> >>>> ExecJobs.
>>> >>>>
>>> >>>> thanks,
>>> >>>> Bill
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Oct 9, 2012 at 11:22 PM, Prashant Kommireddi <
>>> >>>> [EMAIL PROTECTED]> wrote:
>>> >>>>
>>> >>>>> Hi Bill,
>>> >>>>>
>>> >>>>> I am looking at PigStats and JobGraph, and am thinking of adding some
>>> >>>>> functions. Let me know what you think.
>>> >>>>>
>>> >>>>> *getJobList()* returns a List representation of the iterator.
>>> >>>>>
>>> >>>>> public List<JobStats> getJobList() {
>>> >>>>>             return IteratorUtils.toList(iterator());
>>> >>>>> }
>>> >>>>>
>>> >>>>> What do you think about making getSuccessfulJobs() and
>>> getFailedJobs()
>>> >>>>> public and exposing it to the API? Currently they are
>>> package-private?
>>> >>>>>
>>> >>>>> Had another question, seems like the execution flow for
>>> >>>>> PigServer.registerScript/Query is different from PigServer.store().
>>> Was
>>> >>>>> there a reason to make these different? The function store() returns
>>> an
>>> >>>>> ExecJob which is great to get info regarding the runs, but