Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> REMINDER: Pig developer meeting in February


Copy link to this message
-
RE: REMINDER: Pig developer meeting in February
We do not yet have anything public about Penny yet - still trying to figure out when/if it is going out. Don't think there is whole lot of interaction with the error handling proposal but I will let Alan to comment on that.

Given that the error handling proposal is still not finalized and 0.9 already has lots of changes and little time left, I would suggest delaying it to the release after 0.9.

Olga

-----Original Message-----
From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]
Sent: Monday, February 14, 2011 3:49 PM
To: [EMAIL PROTECTED]
Subject: Re: REMINDER: Pig developer meeting in February

Thanks for that, arvind.

Y! folks, is there any public documentation for Penny?
Is there overlap there with the error handling proposal?

Also: think error handling can make it into 0.9 or are we thinking 0.10?

D

On Mon, Feb 14, 2011 at 12:55 PM, [EMAIL PROTECTED]
<[EMAIL PROTECTED]>wrote:

> Hi,
>
> Sorry for the delay in sending this. Following are the notes from the last
> developer's meeting.
>
> Arvind
> -----------
> *Attendees*
>
>   - From Y!: Alan, Santosh, Romain, Daniel, Richard, Ashutosh, Ben, Julian
>   - From Cloudera: Arvind
>
> *Agenda*
>
>   - Error Handling
>   - Brainstorming Ideas For 0.9
>   - Brainstorming Ideas Beyond 0.9
>
> *Error Handling Suggestions/Proposal Discussion:*
>
>   - Allow each statement to declare ONERROR clause with a UDF to handle the
>   control in case of error.
>      - This would be better than current behavior of exiting on error.
>   - Alternatively, allow ONERROR to be declared for an entire
>   script/session which would allow individual statements to override and
>   provide a more specialized UDF for error handling.
>   - Yet another alternative - allow the specification of a threshold number
>   of errors that Pig ignores before exiting.
>   - Key idea is to ensure that the error handling is focused on data error
>   handling and not control-flow.
>   - Action Item: Post the key proposal on the Wiki.
>
> *Brainstorming Ideas For 0.9:*
>
>   - Internal development done by March
>   - Release tentatively by May
>   - Support for ILLUSTRATE.
>   - Current status:
>      - Parser rewrite almost complete
>      - Working on load data according to schema - support for padding
>      missing values
>      - No support for Boolean type planned yet.
>   - Big features in 0.9
>      - Parser change
>      - Macro support
>      - Jython/Script support
>      - Penny (Formally Inspector Gadget): framework to instrument scripts.
>      Allows detection of bad records that cause failures, implement
> constraints.
>         - Works by integrating with the optimizer to produce wrappers for
>         key UDFs of interest.
>         - Agents can be added in different parts of the query
>         - Prepackaged agents available, but framework allows the creation
>         of custom agents as needed.
>         - Pending work - implementation of unit tests, and turning this
>         into a patch.
>
> *Brainstorming Ideas Beyond 0.9:*
>
>   - Support for different backends for Pig (MR, Piranha, Local, Oozie)
>      - Execution engine that can generate plans specific to the underlying
>      architecture and allow controlling routines to
> rewrite/re-optimize the plan
>      mid-execution.
>   - Thread safety when running local jobs - to allow better embedding of
>   Pig as a light-weight tool in web-applications and other multi-threaded
>   environments.
>      - Work includes making UDF context thread-safe and removing statics
>      from the implementation.
>      - Will benefit Oozie and other systems that embed Pig without having
>      to worry about side-effects.
>   - Allow execution to resume from where it left off after due to runtime
>   failure.
>      - May be done by allowing Oozie as a backend where the plan is
>      converted into an Oozie workflow.
>      - Alternatively Pig could delegate blocks of execution to Oozie.
>   - Scalability: Pig should support users who may not know the intricate
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB