Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> REMINDER: Pig developer meeting in February


Copy link to this message
-
Re: REMINDER: Pig developer meeting in February
There is a related work overlapping though with (slightly) different
goals and implementations:

http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper37.pdf
http://www.cidrdb.org/cidr2011/Talks/CIDR11_Ikeda.ppt

Ashutosh

On Mon, Feb 14, 2011 at 15:48, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Thanks for that, arvind.
>
> Y! folks, is there any public documentation for Penny?
> Is there overlap there with the error handling proposal?
>
> Also: think error handling can make it into 0.9 or are we thinking 0.10?
>
> D
>
> On Mon, Feb 14, 2011 at 12:55 PM, [EMAIL PROTECTED]
> <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>
>> Sorry for the delay in sending this. Following are the notes from the last
>> developer's meeting.
>>
>> Arvind
>> -----------
>> *Attendees*
>>
>>   - From Y!: Alan, Santosh, Romain, Daniel, Richard, Ashutosh, Ben, Julian
>>   - From Cloudera: Arvind
>>
>> *Agenda*
>>
>>   - Error Handling
>>   - Brainstorming Ideas For 0.9
>>   - Brainstorming Ideas Beyond 0.9
>>
>> *Error Handling Suggestions/Proposal Discussion:*
>>
>>   - Allow each statement to declare ONERROR clause with a UDF to handle the
>>   control in case of error.
>>      - This would be better than current behavior of exiting on error.
>>   - Alternatively, allow ONERROR to be declared for an entire
>>   script/session which would allow individual statements to override and
>>   provide a more specialized UDF for error handling.
>>   - Yet another alternative - allow the specification of a threshold number
>>   of errors that Pig ignores before exiting.
>>   - Key idea is to ensure that the error handling is focused on data error
>>   handling and not control-flow.
>>   - Action Item: Post the key proposal on the Wiki.
>>
>> *Brainstorming Ideas For 0.9:*
>>
>>   - Internal development done by March
>>   - Release tentatively by May
>>   - Support for ILLUSTRATE.
>>   - Current status:
>>      - Parser rewrite almost complete
>>      - Working on load data according to schema - support for padding
>>      missing values
>>      - No support for Boolean type planned yet.
>>   - Big features in 0.9
>>      - Parser change
>>      - Macro support
>>      - Jython/Script support
>>      - Penny (Formally Inspector Gadget): framework to instrument scripts.
>>      Allows detection of bad records that cause failures, implement
>> constraints.
>>         - Works by integrating with the optimizer to produce wrappers for
>>         key UDFs of interest.
>>         - Agents can be added in different parts of the query
>>         - Prepackaged agents available, but framework allows the creation
>>         of custom agents as needed.
>>         - Pending work - implementation of unit tests, and turning this
>>         into a patch.
>>
>> *Brainstorming Ideas Beyond 0.9:*
>>
>>   - Support for different backends for Pig (MR, Piranha, Local, Oozie)
>>      - Execution engine that can generate plans specific to the underlying
>>      architecture and allow controlling routines to
>> rewrite/re-optimize the plan
>>      mid-execution.
>>   - Thread safety when running local jobs - to allow better embedding of
>>   Pig as a light-weight tool in web-applications and other multi-threaded
>>   environments.
>>      - Work includes making UDF context thread-safe and removing statics
>>      from the implementation.
>>      - Will benefit Oozie and other systems that embed Pig without having
>>      to worry about side-effects.
>>   - Allow execution to resume from where it left off after due to runtime
>>   failure.
>>      - May be done by allowing Oozie as a backend where the plan is
>>      converted into an Oozie workflow.
>>      - Alternatively Pig could delegate blocks of execution to Oozie.
>>   - Scalability: Pig should support users who may not know the intricate
>>   details of the job/architecture. Things such as memory allocation, skew
>>   handling etc automatically without user involvement.
>>   - Allow pig to kill jobs already submitted if the shell exits due to a