Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> REMINDER: Pig developer meeting in February


+
Olga Natkovich 2011-02-03, 18:42
+
Benjamin Reed 2011-02-03, 19:24
+
Ashutosh Chauhan 2011-02-03, 19:28
+
Olga Natkovich 2011-02-08, 18:29
+
Dmitriy Ryaboy 2011-02-09, 05:38
+
Dmitriy Ryaboy 2011-02-12, 01:33
+
Santhosh Srinivasan 2011-02-12, 02:30
+
arvind@...) 2011-02-14, 20:55
+
Dmitriy Ryaboy 2011-02-14, 23:48
+
Olga Natkovich 2011-02-15, 00:11
+
Renato Marroquín Mogrovej... 2011-02-15, 00:53
+
Alan Gates 2011-02-15, 08:54
Copy link to this message
-
Re: REMINDER: Pig developer meeting in February
There is a related work overlapping though with (slightly) different
goals and implementations:

http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper37.pdf
http://www.cidrdb.org/cidr2011/Talks/CIDR11_Ikeda.ppt

Ashutosh

On Mon, Feb 14, 2011 at 15:48, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Thanks for that, arvind.
>
> Y! folks, is there any public documentation for Penny?
> Is there overlap there with the error handling proposal?
>
> Also: think error handling can make it into 0.9 or are we thinking 0.10?
>
> D
>
> On Mon, Feb 14, 2011 at 12:55 PM, [EMAIL PROTECTED]
> <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>
>> Sorry for the delay in sending this. Following are the notes from the last
>> developer's meeting.
>>
>> Arvind
>> -----------
>> *Attendees*
>>
>>   - From Y!: Alan, Santosh, Romain, Daniel, Richard, Ashutosh, Ben, Julian
>>   - From Cloudera: Arvind
>>
>> *Agenda*
>>
>>   - Error Handling
>>   - Brainstorming Ideas For 0.9
>>   - Brainstorming Ideas Beyond 0.9
>>
>> *Error Handling Suggestions/Proposal Discussion:*
>>
>>   - Allow each statement to declare ONERROR clause with a UDF to handle the
>>   control in case of error.
>>      - This would be better than current behavior of exiting on error.
>>   - Alternatively, allow ONERROR to be declared for an entire
>>   script/session which would allow individual statements to override and
>>   provide a more specialized UDF for error handling.
>>   - Yet another alternative - allow the specification of a threshold number
>>   of errors that Pig ignores before exiting.
>>   - Key idea is to ensure that the error handling is focused on data error
>>   handling and not control-flow.
>>   - Action Item: Post the key proposal on the Wiki.
>>
>> *Brainstorming Ideas For 0.9:*
>>
>>   - Internal development done by March
>>   - Release tentatively by May
>>   - Support for ILLUSTRATE.
>>   - Current status:
>>      - Parser rewrite almost complete
>>      - Working on load data according to schema - support for padding
>>      missing values
>>      - No support for Boolean type planned yet.
>>   - Big features in 0.9
>>      - Parser change
>>      - Macro support
>>      - Jython/Script support
>>      - Penny (Formally Inspector Gadget): framework to instrument scripts.
>>      Allows detection of bad records that cause failures, implement
>> constraints.
>>         - Works by integrating with the optimizer to produce wrappers for
>>         key UDFs of interest.
>>         - Agents can be added in different parts of the query
>>         - Prepackaged agents available, but framework allows the creation
>>         of custom agents as needed.
>>         - Pending work - implementation of unit tests, and turning this
>>         into a patch.
>>
>> *Brainstorming Ideas Beyond 0.9:*
>>
>>   - Support for different backends for Pig (MR, Piranha, Local, Oozie)
>>      - Execution engine that can generate plans specific to the underlying
>>      architecture and allow controlling routines to
>> rewrite/re-optimize the plan
>>      mid-execution.
>>   - Thread safety when running local jobs - to allow better embedding of
>>   Pig as a light-weight tool in web-applications and other multi-threaded
>>   environments.
>>      - Work includes making UDF context thread-safe and removing statics
>>      from the implementation.
>>      - Will benefit Oozie and other systems that embed Pig without having
>>      to worry about side-effects.
>>   - Allow execution to resume from where it left off after due to runtime
>>   failure.
>>      - May be done by allowing Oozie as a backend where the plan is
>>      converted into an Oozie workflow.
>>      - Alternatively Pig could delegate blocks of execution to Oozie.
>>   - Scalability: Pig should support users who may not know the intricate
>>   details of the job/architecture. Things such as memory allocation, skew
>>   handling etc automatically without user involvement.
>>   - Allow pig to kill jobs already submitted if the shell exits due to a
+
Milind Bhandarkar 2011-02-23, 02:08
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB