Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Notes from the contributor meeting


Copy link to this message
-
Notes from the contributor meeting
Hi,

Attendees:

Dmitriy Ryaboy
Alan Gates
Ashutosh Chauhan
Daniel Dai
Xuefu Zhang
Richard Ding
Olga Natkovich

Topics discussed:
(1)    Improving Pig testing:

a.       Short term

                                                               i.      making tests run significantly faster. Dmitriy said he would work on transitioning the tests into local mode. Hopefully that will reduce the run time from 10 hours to about 3.

                                                             ii.      Get test patch automation back on. I took an action item to follow up on this.

b.      Longer term

                                                               i.      Move beyond unit testing. Alan suggested that's once recently open sourced e2e harness is ready to be used (3-6 month) we would move most of e2e tests we currently run as unit tests into the e2e tests and only leave true unit tests in JUnit. This will reduce unit test runtime to something under an hour and will allow to run the e2e tests on real data and real clusters making the testing more realistic.

                                                             ii.      Figuring out a way to make UDF testing easier. I don't think we had many good ideas on how to do this. Needs further discussion

(2)    Discussion on release management. Main goal is to maintain stability for production systems while allowing changes to be released quickly. We came up with the following proposal:

a.       Making major releases time (not feature) based and release every 3 month

b.      Make sure that branches post release are kept stable by only allowing P1 changes (failures with no reasonable workaround or silent failures)

c.       Develop disruptive features (example - parser changes) on separate branches and only folding them in once the code was completed and stabilized.

(3)    Discussion on revamping UDF interface

a.       Making interface simpler - no need to implement 3 different version

b.      Making it more intuitive

                                                               i.      No need for wrapping input parameters into tuples

                                                             ii.      No need for parameters casting

                                                            iii.      Simplify schema management

                                                           iv.      Simplify overloading

c.       This will need to coexist with the current approach for a significant amount of time (6-12 month) to let users transition.

(4)    Status of Piggybank

a.       Not much progress so far. Dmitriy is struggling with the build process.
Other attendees - please, feel free to add.

Olga