Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - REMINDER: Pig developer meeting in February


Copy link to this message
-
RE: REMINDER: Pig developer meeting in February
Olga Natkovich 2011-02-15, 00:11
We do not yet have anything public about Penny yet - still trying to figure out when/if it is going out. Don't think there is whole lot of interaction with the error handling proposal but I will let Alan to comment on that.

Given that the error handling proposal is still not finalized and 0.9 already has lots of changes and little time left, I would suggest delaying it to the release after 0.9.

Olga

-----Original Message-----
From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]
Sent: Monday, February 14, 2011 3:49 PM
To: [EMAIL PROTECTED]
Subject: Re: REMINDER: Pig developer meeting in February

Thanks for that, arvind.

Y! folks, is there any public documentation for Penny?
Is there overlap there with the error handling proposal?

Also: think error handling can make it into 0.9 or are we thinking 0.10?

D

On Mon, Feb 14, 2011 at 12:55 PM, [EMAIL PROTECTED]
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> Sorry for the delay in sending this. Following are the notes from the last
> developer's meeting.
>
> Arvind
> -----------
> *Attendees*
>
>   - From Y!: Alan, Santosh, Romain, Daniel, Richard, Ashutosh, Ben, Julian
>   - From Cloudera: Arvind
>
> *Agenda*
>
>   - Error Handling
>   - Brainstorming Ideas For 0.9
>   - Brainstorming Ideas Beyond 0.9
>
> *Error Handling Suggestions/Proposal Discussion:*
>
>   - Allow each statement to declare ONERROR clause with a UDF to handle the
>   control in case of error.
>      - This would be better than current behavior of exiting on error.
>   - Alternatively, allow ONERROR to be declared for an entire
>   script/session which would allow individual statements to override and
>   provide a more specialized UDF for error handling.
>   - Yet another alternative - allow the specification of a threshold number
>   of errors that Pig ignores before exiting.
>   - Key idea is to ensure that the error handling is focused on data error
>   handling and not control-flow.
>   - Action Item: Post the key proposal on the Wiki.
>
> *Brainstorming Ideas For 0.9:*
>
>   - Internal development done by March
>   - Release tentatively by May
>   - Support for ILLUSTRATE.
>   - Current status:
>      - Parser rewrite almost complete
>      - Working on load data according to schema - support for padding
>      missing values
>      - No support for Boolean type planned yet.
>   - Big features in 0.9
>      - Parser change
>      - Macro support
>      - Jython/Script support
>      - Penny (Formally Inspector Gadget): framework to instrument scripts.
>      Allows detection of bad records that cause failures, implement
> constraints.
>         - Works by integrating with the optimizer to produce wrappers for
>         key UDFs of interest.
>         - Agents can be added in different parts of the query
>         - Prepackaged agents available, but framework allows the creation
>         of custom agents as needed.
>         - Pending work - implementation of unit tests, and turning this
>         into a patch.
>
> *Brainstorming Ideas Beyond 0.9:*
>
>   - Support for different backends for Pig (MR, Piranha, Local, Oozie)
>      - Execution engine that can generate plans specific to the underlying
>      architecture and allow controlling routines to
> rewrite/re-optimize the plan
>      mid-execution.
>   - Thread safety when running local jobs - to allow better embedding of
>   Pig as a light-weight tool in web-applications and other multi-threaded
>   environments.
>      - Work includes making UDF context thread-safe and removing statics
>      from the implementation.
>      - Will benefit Oozie and other systems that embed Pig without having
>      to worry about side-effects.
>   - Allow execution to resume from where it left off after due to runtime
>   failure.
>      - May be done by allowing Oozie as a backend where the plan is
>      converted into an Oozie workflow.
>      - Alternatively Pig could delegate blocks of execution to Oozie.
>   - Scalability: Pig should support users who may not know the intricate