|
Olga Natkovich
2011-03-03, 00:52
Dmitriy Ryaboy
2011-03-03, 02:31
Santhosh Srinivasan
2011-03-03, 02:44
Dmitriy Ryaboy
2011-03-03, 02:57
Santhosh Srinivasan
2011-03-03, 02:58
Alan Gates
2011-03-03, 18:43
Santhosh Srinivasan
2011-03-03, 19:51
Thejas M Nair
2011-03-04, 01:34
Santhosh Srinivasan
2011-03-04, 01:50
Dmitriy Ryaboy
2011-03-04, 01:53
Eric Lubow
2011-03-03, 20:03
Corbin Hoenes
2011-03-04, 12:45
Kaluskar, Sanjay
2011-03-04, 00:53
Jai Krishna
2011-03-03, 04:14
Mridul Muralidharan
2011-03-04, 13:48
|
-
[DISCUSSION] Pig.nextOlga Natkovich 2011-03-03, 00:52
Pig Users and Developers,
We are starting to plan the work after Pig 0.9. One thing we need to decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0. I believe that we are ready to declare 1.0. Here are my reasons: (1) We are mature enough and produce good quality releases (2) Our interface no longer change in major ways (3) We have a growing user community and we want the newcomers to know that our releases are stable (4) If the next release is 0.10 and we decide that we should switch on the following release going from 0.10 to 1.0 will generate a lot of confusion. I wanted to start this conversation and see what others think before deciding if it is worth while to call a vote. Olga +
Olga Natkovich 2011-03-03, 00:52
-
Re: [DISCUSSION] Pig.nextDmitriy Ryaboy 2011-03-03, 02:31
I am worried that the new optimization plan work has not had a chance to
settle in, and we are releasing a brand new parser for the language in 0.9. Those are pretty significant changes, if the idea behind calling something a "1.0" is stability, we may want to give them a release to mature a bit. Of course we can just release 0.9x for a while until we feel this stuff has been tested in a wide enough variety of installations / hadoop configurations / use cases. D On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich <[EMAIL PROTECTED]> wrote: > Pig Users and Developers, > > We are starting to plan the work after Pig 0.9. One thing we need to decide > is what name/number to give to the next release: Pig 0.10 or Pig 1.0. > > I believe that we are ready to declare 1.0. Here are my reasons: > > (1) We are mature enough and produce good quality releases > (2) Our interface no longer change in major ways > (3) We have a growing user community and we want the newcomers to know > that our releases are stable > (4) If the next release is 0.10 and we decide that we should switch on > the following release going from 0.10 to 1.0 will generate a lot of > confusion. > > I wanted to start this conversation and see what others think before > deciding if it is worth while to call a vote. > > Olga > +
Dmitriy Ryaboy 2011-03-03, 02:31
-
RE: [DISCUSSION] Pig.nextSanthosh Srinivasan 2011-03-03, 02:44
I am in agreement with Dmitriy. In addition, Hadoop itself has not gone 1.0 due to the lack of stable APIs. We should probably aim for 1.0 around the same time.
Santhosh -----Original Message----- From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] Sent: Wednesday, March 02, 2011 6:31 PM To: [EMAIL PROTECTED] Cc: Olga Natkovich Subject: Re: [DISCUSSION] Pig.next I am worried that the new optimization plan work has not had a chance to settle in, and we are releasing a brand new parser for the language in 0.9. Those are pretty significant changes, if the idea behind calling something a "1.0" is stability, we may want to give them a release to mature a bit. Of course we can just release 0.9x for a while until we feel this stuff has been tested in a wide enough variety of installations / hadoop configurations / use cases. D On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich <[EMAIL PROTECTED]> wrote: > Pig Users and Developers, > > We are starting to plan the work after Pig 0.9. One thing we need to > decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0. > > I believe that we are ready to declare 1.0. Here are my reasons: > > (1) We are mature enough and produce good quality releases > (2) Our interface no longer change in major ways > (3) We have a growing user community and we want the newcomers to know > that our releases are stable > (4) If the next release is 0.10 and we decide that we should switch on > the following release going from 0.10 to 1.0 will generate a lot of > confusion. > > I wanted to start this conversation and see what others think before > deciding if it is worth while to call a vote. > > Olga > +
Santhosh Srinivasan 2011-03-03, 02:44
-
Re: [DISCUSSION] Pig.nextDmitriy Ryaboy 2011-03-03, 02:57
by way of crazy ideas -- I kind of feel like 0.8 + a few patches might be
our 1.0, and 0.9 can be 1.1 branch. D On Wed, Mar 2, 2011 at 6:44 PM, Santhosh Srinivasan <[EMAIL PROTECTED]>wrote: > I am in agreement with Dmitriy. In addition, Hadoop itself has not gone 1.0 > due to the lack of stable APIs. We should probably aim for 1.0 around the > same time. > > Santhosh > > -----Original Message----- > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, March 02, 2011 6:31 PM > To: [EMAIL PROTECTED] > Cc: Olga Natkovich > Subject: Re: [DISCUSSION] Pig.next > > I am worried that the new optimization plan work has not had a chance to > settle in, and we are releasing a brand new parser for the language in 0.9. > Those are pretty significant changes, if the idea behind calling something > a "1.0" is stability, we may want to give them a release to mature a bit. Of > course we can just release 0.9x for a while until we feel this stuff has > been tested in a wide enough variety of installations / hadoop > configurations / use cases. > > D > > On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich <[EMAIL PROTECTED]> > wrote: > > > Pig Users and Developers, > > > > We are starting to plan the work after Pig 0.9. One thing we need to > > decide is what name/number to give to the next release: Pig 0.10 or Pig > 1.0. > > > > I believe that we are ready to declare 1.0. Here are my reasons: > > > > (1) We are mature enough and produce good quality releases > > (2) Our interface no longer change in major ways > > (3) We have a growing user community and we want the newcomers to > know > > that our releases are stable > > (4) If the next release is 0.10 and we decide that we should switch > on > > the following release going from 0.10 to 1.0 will generate a lot of > > confusion. > > > > I wanted to start this conversation and see what others think before > > deciding if it is worth while to call a vote. > > > > Olga > > > +
Dmitriy Ryaboy 2011-03-03, 02:57
-
RE: [DISCUSSION] Pig.nextSanthosh Srinivasan 2011-03-03, 02:58
I am not in agreement with that :)
________________________________ From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] Sent: Wednesday, March 02, 2011 6:57 PM To: [EMAIL PROTECTED] Cc: Santhosh Srinivasan; Olga Natkovich Subject: Re: [DISCUSSION] Pig.next by way of crazy ideas -- I kind of feel like 0.8 + a few patches might be our 1.0, and 0.9 can be 1.1 branch. D On Wed, Mar 2, 2011 at 6:44 PM, Santhosh Srinivasan <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: I am in agreement with Dmitriy. In addition, Hadoop itself has not gone 1.0 due to the lack of stable APIs. We should probably aim for 1.0 around the same time. Santhosh -----Original Message----- From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>] Sent: Wednesday, March 02, 2011 6:31 PM To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Cc: Olga Natkovich Subject: Re: [DISCUSSION] Pig.next I am worried that the new optimization plan work has not had a chance to settle in, and we are releasing a brand new parser for the language in 0.9. Those are pretty significant changes, if the idea behind calling something a "1.0" is stability, we may want to give them a release to mature a bit. Of course we can just release 0.9x for a while until we feel this stuff has been tested in a wide enough variety of installations / hadoop configurations / use cases. D On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: > Pig Users and Developers, > > We are starting to plan the work after Pig 0.9. One thing we need to > decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0. > > I believe that we are ready to declare 1.0. Here are my reasons: > > (1) We are mature enough and produce good quality releases > (2) Our interface no longer change in major ways > (3) We have a growing user community and we want the newcomers to know > that our releases are stable > (4) If the next release is 0.10 and we decide that we should switch on > the following release going from 0.10 to 1.0 will generate a lot of > confusion. > > I wanted to start this conversation and see what others think before > deciding if it is worth while to call a vote. > > Olga > +
Santhosh Srinivasan 2011-03-03, 02:58
-
Re: [DISCUSSION] Pig.nextAlan Gates 2011-03-03, 18:43
I agree that there will probably need to be several 0.9.x releases as
the new optimization and parser work mature. As a consequence of this it may be longer between 0.9 and Pig.next then there has been between the last few releases. That only delays the question of what we call Pig.next, it does not answer it. To me, declaring 1.0 would mean the following things: 1) Pig is ready for production use, at least by the brave. 2) It is still rough around the edges, you do not get a smooth product until 2.0 or later. 3) We will not make non-backward compatible changes to interfaces we have declared stable. Pig is in use in production in multiple places, I do not think anyone will argue that it is not rough around the edges, and because we have users who run tens of thousands of Pig jobs daily non-backward compatible changes are impossible anyway. As for waiting for Hadoop to go 1.0, that is like waiting for Congress to fix social security. I am sure they will get there, but I may be retired first. In all seriousness, the Hadoop project has not been moving with speed or agility over the last few years, and I do not think waiting for them to do something is a good idea. Nor do I see it as necessary. Before we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we are bound more tightly to Hadoop then we are to log4j. But we are still our own project. 1.0 is a claim we are making about ourselves, not about the platform we run on. We should choose our release numbering in a way that sends a clear message to our users, and let those same users evaluate Hadoop separately. Also the argument that we should not go 1.0 because we are changing a lot of things is bogus. We are always changing a lot of things. If 1.0 means we will not make any major changes, then we will not get there until we go into some kinds of maintenance mode where we deem the majority of the work to have been done. I hope I have retired before we reach that state. My perspective on what 1.0 means obviously comes from a developer inside the project. I would be interested in hearing from users and anyone with a more marketing oriented perspective on what message 1.0 would send to (potential) pig users. Alan. On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote: > I am worried that the new optimization plan work has not had a > chance to > settle in, and we are releasing a brand new parser for the language > in 0.9. > Those are pretty significant changes, if the idea behind calling > something a > "1.0" is stability, we may want to give them a release to mature a > bit. Of > course we can just release 0.9x for a while until we feel this stuff > has > been tested in a wide enough variety of installations / hadoop > configurations / use cases. > > D > > On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich <[EMAIL PROTECTED]> > wrote: > >> Pig Users and Developers, >> >> We are starting to plan the work after Pig 0.9. One thing we need >> to decide >> is what name/number to give to the next release: Pig 0.10 or Pig 1.0. >> >> I believe that we are ready to declare 1.0. Here are my reasons: >> >> (1) We are mature enough and produce good quality releases >> (2) Our interface no longer change in major ways >> (3) We have a growing user community and we want the newcomers >> to know >> that our releases are stable >> (4) If the next release is 0.10 and we decide that we should >> switch on >> the following release going from 0.10 to 1.0 will generate a lot of >> confusion. >> >> I wanted to start this conversation and see what others think before >> deciding if it is worth while to call a vote. >> >> Olga >> +
Alan Gates 2011-03-03, 18:43
-
RE: [DISCUSSION] Pig.nextSanthosh Srinivasan 2011-03-03, 19:51
Hilarious.
Getting to the serious points. What are the user facing items? I have listed a few below. Please feel free to add if I have missed out on anything. 1. The language syntax 2. The language semantics 3. UDFs (EvalFunc, Load, Store, Algebraic, Accumulator, etc.) 4. Java APIs (PigServer, etc.) In the past, we have agreed that Pig will support Hadoop APIs. I think its very important to understand when Hadoop will stabilize the APIs. It will have an impact on the APIs that we expose to our users (e.g., input and output formats). I strongly believe that this is an important input in the decision making process, especially wrt backward compatibility. Santhosh -----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 03, 2011 10:44 AM To: [EMAIL PROTECTED] Subject: Re: [DISCUSSION] Pig.next I agree that there will probably need to be several 0.9.x releases as the new optimization and parser work mature. As a consequence of this it may be longer between 0.9 and Pig.next then there has been between the last few releases. That only delays the question of what we call Pig.next, it does not answer it. To me, declaring 1.0 would mean the following things: 1) Pig is ready for production use, at least by the brave. 2) It is still rough around the edges, you do not get a smooth product until 2.0 or later. 3) We will not make non-backward compatible changes to interfaces we have declared stable. Pig is in use in production in multiple places, I do not think anyone will argue that it is not rough around the edges, and because we have users who run tens of thousands of Pig jobs daily non-backward compatible changes are impossible anyway. As for waiting for Hadoop to go 1.0, that is like waiting for Congress to fix social security. I am sure they will get there, but I may be retired first. In all seriousness, the Hadoop project has not been moving with speed or agility over the last few years, and I do not think waiting for them to do something is a good idea. Nor do I see it as necessary. Before we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we are bound more tightly to Hadoop then we are to log4j. But we are still our own project. 1.0 is a claim we are making about ourselves, not about the platform we run on. We should choose our release numbering in a way that sends a clear message to our users, and let those same users evaluate Hadoop separately. Also the argument that we should not go 1.0 because we are changing a lot of things is bogus. We are always changing a lot of things. If 1.0 means we will not make any major changes, then we will not get there until we go into some kinds of maintenance mode where we deem the majority of the work to have been done. I hope I have retired before we reach that state. My perspective on what 1.0 means obviously comes from a developer inside the project. I would be interested in hearing from users and anyone with a more marketing oriented perspective on what message 1.0 would send to (potential) pig users. Alan. On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote: > I am worried that the new optimization plan work has not had a > chance to > settle in, and we are releasing a brand new parser for the language > in 0.9. > Those are pretty significant changes, if the idea behind calling > something a > "1.0" is stability, we may want to give them a release to mature a > bit. Of > course we can just release 0.9x for a while until we feel this stuff > has > been tested in a wide enough variety of installations / hadoop > configurations / use cases. > > D > > On Wed, Mar 2, 2011 at 4:52 PM, Olga Natkovich <[EMAIL PROTECTED]> > wrote: > >> Pig Users and Developers, >> >> We are starting to plan the work after Pig 0.9. One thing we need >> to decide >> is what name/number to give to the next release: Pig 0.10 or Pig 1.0. >> >> I believe that we are ready to declare 1.0. Here are my reasons: >> >> (1) We are mature enough and produce good quality releases +
Santhosh Srinivasan 2011-03-03, 19:51
-
Re: [DISCUSSION] Pig.nextThejas M Nair 2011-03-04, 01:34
The interfaces that pig have are at different levels of maturity, and most of the interfaces have been marked as stable or evolving to indicate that.
Most of the core interfaces including the language, and udfs belong to the stable category. I think this is sufficient for 1.0. There will always be some new interfaces that will be in evolving category. The hadoop classes used by the load/store functions probably belong to the 'slowly evolving' category. But I don't think any change is anticipated soon. By the time it changes we might be ready for pig 2.0 ! Regarding the impact of big changes in 0.8 and 0.9 not having had the time to settle in, I think by the time 1.0/0.10 is ready those changes would have been well tested in all sorts of setups/configurations. -Thejas On 3/3/11 11:51 AM, "Santhosh Srinivasan" <[EMAIL PROTECTED]> wrote: Hilarious. Getting to the serious points. What are the user facing items? I have listed a few below. Please feel free to add if I have missed out on anything. 1. The language syntax 2. The language semantics 3. UDFs (EvalFunc, Load, Store, Algebraic, Accumulator, etc.) 4. Java APIs (PigServer, etc.) In the past, we have agreed that Pig will support Hadoop APIs. I think its very important to understand when Hadoop will stabilize the APIs. It will have an impact on the APIs that we expose to our users (e.g., input and output formats). I strongly believe that this is an important input in the decision making process, especially wrt backward compatibility. Santhosh -----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 03, 2011 10:44 AM To: [EMAIL PROTECTED] Subject: Re: [DISCUSSION] Pig.next I agree that there will probably need to be several 0.9.x releases as the new optimization and parser work mature. As a consequence of this it may be longer between 0.9 and Pig.next then there has been between the last few releases. That only delays the question of what we call Pig.next, it does not answer it. To me, declaring 1.0 would mean the following things: 1) Pig is ready for production use, at least by the brave. 2) It is still rough around the edges, you do not get a smooth product until 2.0 or later. 3) We will not make non-backward compatible changes to interfaces we have declared stable. Pig is in use in production in multiple places, I do not think anyone will argue that it is not rough around the edges, and because we have users who run tens of thousands of Pig jobs daily non-backward compatible changes are impossible anyway. As for waiting for Hadoop to go 1.0, that is like waiting for Congress to fix social security. I am sure they will get there, but I may be retired first. In all seriousness, the Hadoop project has not been moving with speed or agility over the last few years, and I do not think waiting for them to do something is a good idea. Nor do I see it as necessary. Before we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we are bound more tightly to Hadoop then we are to log4j. But we are still our own project. 1.0 is a claim we are making about ourselves, not about the platform we run on. We should choose our release numbering in a way that sends a clear message to our users, and let those same users evaluate Hadoop separately. Also the argument that we should not go 1.0 because we are changing a lot of things is bogus. We are always changing a lot of things. If 1.0 means we will not make any major changes, then we will not get there until we go into some kinds of maintenance mode where we deem the majority of the work to have been done. I hope I have retired before we reach that state. My perspective on what 1.0 means obviously comes from a developer inside the project. I would be interested in hearing from users and anyone with a more marketing oriented perspective on what message 1.0 would send to (potential) pig users. Alan. On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote: > I am worried that the new optimization plan work has not had a +
Thejas M Nair 2011-03-04, 01:34
-
RE: [DISCUSSION] Pig.nextSanthosh Srinivasan 2011-03-04, 01:50
>> The hadoop classes used by the load/store functions probably belong to the 'slowly evolving' category. But I don't think any change is anticipated soon. By the time it changes we might be ready for pig 2.0 !
Exactly! How do you know that no changes are anticipated? We need inputs from the Hadoop team because we don't know. If they promise that these APIs will not change, lets say till mid-2012 then we should be good to go. If they say that it will change in 2011 then we will be breaking backward compatibility pretty soon. Santhosh ________________________________ From: Thejas M Nair Sent: Thursday, March 03, 2011 5:35 PM To: [EMAIL PROTECTED]; Santhosh Srinivasan Subject: Re: [DISCUSSION] Pig.next The interfaces that pig have are at different levels of maturity, and most of the interfaces have been marked as stable or evolving to indicate that. Most of the core interfaces including the language, and udfs belong to the stable category. I think this is sufficient for 1.0. There will always be some new interfaces that will be in evolving category. The hadoop classes used by the load/store functions probably belong to the 'slowly evolving' category. But I don't think any change is anticipated soon. By the time it changes we might be ready for pig 2.0 ! Regarding the impact of big changes in 0.8 and 0.9 not having had the time to settle in, I think by the time 1.0/0.10 is ready those changes would have been well tested in all sorts of setups/configurations. -Thejas On 3/3/11 11:51 AM, "Santhosh Srinivasan" <[EMAIL PROTECTED]> wrote: Hilarious. Getting to the serious points. What are the user facing items? I have listed a few below. Please feel free to add if I have missed out on anything. 1. The language syntax 2. The language semantics 3. UDFs (EvalFunc, Load, Store, Algebraic, Accumulator, etc.) 4. Java APIs (PigServer, etc.) In the past, we have agreed that Pig will support Hadoop APIs. I think its very important to understand when Hadoop will stabilize the APIs. It will have an impact on the APIs that we expose to our users (e.g., input and output formats). I strongly believe that this is an important input in the decision making process, especially wrt backward compatibility. Santhosh -----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: Thursday, March 03, 2011 10:44 AM To: [EMAIL PROTECTED] Subject: Re: [DISCUSSION] Pig.next I agree that there will probably need to be several 0.9.x releases as the new optimization and parser work mature. As a consequence of this it may be longer between 0.9 and Pig.next then there has been between the last few releases. That only delays the question of what we call Pig.next, it does not answer it. To me, declaring 1.0 would mean the following things: 1) Pig is ready for production use, at least by the brave. 2) It is still rough around the edges, you do not get a smooth product until 2.0 or later. 3) We will not make non-backward compatible changes to interfaces we have declared stable. Pig is in use in production in multiple places, I do not think anyone will argue that it is not rough around the edges, and because we have users who run tens of thousands of Pig jobs daily non-backward compatible changes are impossible anyway. As for waiting for Hadoop to go 1.0, that is like waiting for Congress to fix social security. I am sure they will get there, but I may be retired first. In all seriousness, the Hadoop project has not been moving with speed or agility over the last few years, and I do not think waiting for them to do something is a good idea. Nor do I see it as necessary. Before we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we are bound more tightly to Hadoop then we are to log4j. But we are still our own project. 1.0 is a claim we are making about ourselves, not about the platform we run on. We should choose our release numbering in a way that sends a clear message to our users, and let those same users evaluate Hadoop separately. Also the argument that we should not go 1.0 because we are changing a lot of things is bogus. We are always changing a lot of things. If 1.0 means we will not make any major changes, then we will not get there until we go into some kinds of maintenance mode where we deem the majority of the work to have been done. I hope I have retired before we reach that state. My perspective on what 1.0 means obviously comes from a developer inside the project. I would be interested in hearing from users and anyone with a more marketing oriented perspective on what message 1.0 would send to (potential) pig users. Alan. On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote: +
Santhosh Srinivasan 2011-03-04, 01:50
-
Re: [DISCUSSION] Pig.nextDmitriy Ryaboy 2011-03-04, 01:53
Only if we start supporting a different version of Hadoop.
And they did just un-deprecate the "old" interface... On Thu, Mar 3, 2011 at 5:50 PM, Santhosh Srinivasan <[EMAIL PROTECTED]>wrote: > >> The hadoop classes used by the load/store functions probably belong to > the 'slowly evolving' category. But I don't think any change is anticipated > soon. By the time it changes we might be ready for pig 2.0 ! > Exactly! How do you know that no changes are anticipated? We need inputs > from the Hadoop team because we don't know. If they promise that these APIs > will not change, lets say till mid-2012 then we should be good to go. If > they say that it will change in 2011 then we will be breaking backward > compatibility pretty soon. > > Santhosh > > ________________________________ > From: Thejas M Nair > Sent: Thursday, March 03, 2011 5:35 PM > To: [EMAIL PROTECTED]; Santhosh Srinivasan > Subject: Re: [DISCUSSION] Pig.next > > The interfaces that pig have are at different levels of maturity, and most > of the interfaces have been marked as stable or evolving to indicate that. > Most of the core interfaces including the language, and udfs belong to the > stable category. I think this is sufficient for 1.0. There will always be > some new interfaces that will be in evolving category. > > The hadoop classes used by the load/store functions probably belong to the > 'slowly evolving' category. But I don't think any change is anticipated > soon. By the time it changes we might be ready for pig 2.0 ! > > Regarding the impact of big changes in 0.8 and 0.9 not having had the time > to settle in, I think by the time 1.0/0.10 is ready those changes would have > been well tested in all sorts of setups/configurations. > > -Thejas > > > > On 3/3/11 11:51 AM, "Santhosh Srinivasan" <[EMAIL PROTECTED]> wrote: > > Hilarious. > > Getting to the serious points. > > What are the user facing items? I have listed a few below. Please feel free > to add if I have missed out on anything. > > 1. The language syntax > 2. The language semantics > 3. UDFs (EvalFunc, Load, Store, Algebraic, Accumulator, etc.) > 4. Java APIs (PigServer, etc.) > > In the past, we have agreed that Pig will support Hadoop APIs. I think its > very important to understand when Hadoop will stabilize the APIs. It will > have an impact on the APIs that we expose to our users (e.g., input and > output formats). > > I strongly believe that this is an important input in the decision making > process, especially wrt backward compatibility. > > Santhosh > > -----Original Message----- > From: Alan Gates [mailto:[EMAIL PROTECTED]] > Sent: Thursday, March 03, 2011 10:44 AM > To: [EMAIL PROTECTED] > Subject: Re: [DISCUSSION] Pig.next > > I agree that there will probably need to be several 0.9.x releases as the > new optimization and parser work mature. As a consequence of this it may be > longer between 0.9 and Pig.next then there has been between the last few > releases. That only delays the question of what we call Pig.next, it does > not answer it. > > To me, declaring 1.0 would mean the following things: > > 1) Pig is ready for production use, at least by the brave. > 2) It is still rough around the edges, you do not get a smooth product > until 2.0 or later. > 3) We will not make non-backward compatible changes to interfaces we have > declared stable. > > Pig is in use in production in multiple places, I do not think anyone will > argue that it is not rough around the edges, and because we have users who > run tens of thousands of Pig jobs daily non-backward compatible changes are > impossible anyway. > > As for waiting for Hadoop to go 1.0, that is like waiting for Congress to > fix social security. I am sure they will get there, but I may be retired > first. In all seriousness, the Hadoop project has not been moving with > speed or agility over the last few years, and I do not think waiting for > them to do something is a good idea. Nor do I see it as necessary. Before > we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we +
Dmitriy Ryaboy 2011-03-04, 01:53
-
Re: [DISCUSSION] Pig.nextEric Lubow 2011-03-03, 20:03
Coming from a user's perspective, I would have the following to say:
Anyone who is using Hadoop has an obvious understanding that 1.0 doesn't really mean much if it's in use (which Pig obviously is). What 1.0 has the potential to do for someone like me is that I may be able to go to Amazon and say, look, Pig is at 1.0 and you are still offering 0.6 on EMR. Having Pig on something like EMR is what allows wider spread adoption because it lowers the barrier to entry. I am not an expert at any of this stuff (in fact, I don't even know Java), but I am able to use Hadoop and then train others to write MR jobs with a fair amount of ease because of a query language like Pig. Tagging it with 1.0 might make a statement to larger organizations, but most smaller companies and startups just want to know it's usable. And since there is no alpha or beta attached anywhere, that's good enough for most. The only caveat is that I am working off of Pig 0.6 because all my data is in S3 and I use Elastic Map Reduce for my jobs. The only other thing I would say is that if Pig goes 1.0, can it get a new logo? I know there are a lot of +1s for this so I figured I would throw my +1 here too. -e On Thu, Mar 3, 2011 at 13:43, Alan Gates <[EMAIL PROTECTED]> wrote: > I agree that there will probably need to be several 0.9.x releases as the > new optimization and parser work mature. As a consequence of this it may be > longer between 0.9 and Pig.next then there has been between the last few > releases. That only delays the question of what we call Pig.next, it does > not answer it. > > To me, declaring 1.0 would mean the following things: > > 1) Pig is ready for production use, at least by the brave. > 2) It is still rough around the edges, you do not get a smooth product > until 2.0 or later. > 3) We will not make non-backward compatible changes to interfaces we have > declared stable. > > Pig is in use in production in multiple places, I do not think anyone will > argue that it is not rough around the edges, and because we have users who > run tens of thousands of Pig jobs daily non-backward compatible changes are > impossible anyway. > > As for waiting for Hadoop to go 1.0, that is like waiting for Congress to > fix social security. I am sure they will get there, but I may be retired > first. In all seriousness, the Hadoop project has not been moving with > speed or agility over the last few years, and I do not think waiting for > them to do something is a good idea. Nor do I see it as necessary. Before > we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we > are bound more tightly to Hadoop then we are to log4j. But we are still our > own project. 1.0 is a claim we are making about ourselves, not about the > platform we run on. We should choose our release numbering in a way that > sends a clear message to our users, and let those same users evaluate Hadoop > separately. > > Also the argument that we should not go 1.0 because we are changing a lot > of things is bogus. We are always changing a lot of things. If 1.0 means > we will not make any major changes, then we will not get there until we go > into some kinds of maintenance mode where we deem the majority of the work > to have been done. I hope I have retired before we reach that state. > > My perspective on what 1.0 means obviously comes from a developer inside > the project. I would be interested in hearing from users and anyone with a > more marketing oriented perspective on what message 1.0 would send to > (potential) pig users. > > Alan. > > On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote: > > I am worried that the new optimization plan work has not had a chance to >> settle in, and we are releasing a brand new parser for the language in >> 0.9. >> Those are pretty significant changes, if the idea behind calling something >> a >> "1.0" is stability, we may want to give them a release to mature a bit. Of >> course we can just release 0.9x for a while until we feel this stuff has Eric Lubow e: [EMAIL PROTECTED] w: eric.lubow.org +
Eric Lubow 2011-03-03, 20:03
-
Re: [DISCUSSION] Pig.nextCorbin Hoenes 2011-03-04, 12:45
What is wrong with porky the pig as the logo?
:) That's all folks! Sent from my iPhone On Mar 3, 2011, at 1:03 PM, Eric Lubow <[EMAIL PROTECTED]> wrote: > Coming from a user's perspective, I would have the following to say: > > Anyone who is using Hadoop has an obvious understanding that 1.0 doesn't > really mean much if it's in use (which Pig obviously is). What 1.0 has the > potential to do for someone like me is that I may be able to go to Amazon > and say, look, Pig is at 1.0 and you are still offering 0.6 on EMR. Having > Pig on something like EMR is what allows wider spread adoption because it > lowers the barrier to entry. > > I am not an expert at any of this stuff (in fact, I don't even know Java), > but I am able to use Hadoop and then train others to write MR jobs with a > fair amount of ease because of a query language like Pig. Tagging it with > 1.0 might make a statement to larger organizations, but most smaller > companies and startups just want to know it's usable. And since there is no > alpha or beta attached anywhere, that's good enough for most. > > The only caveat is that I am working off of Pig 0.6 because all my data is > in S3 and I use Elastic Map Reduce for my jobs. > > The only other thing I would say is that if Pig goes 1.0, can it get a new > logo? I know there are a lot of +1s for this so I figured I would throw my > +1 here too. > > -e > > On Thu, Mar 3, 2011 at 13:43, Alan Gates <[EMAIL PROTECTED]> wrote: > >> I agree that there will probably need to be several 0.9.x releases as the >> new optimization and parser work mature. As a consequence of this it may be >> longer between 0.9 and Pig.next then there has been between the last few >> releases. That only delays the question of what we call Pig.next, it does >> not answer it. >> >> To me, declaring 1.0 would mean the following things: >> >> 1) Pig is ready for production use, at least by the brave. >> 2) It is still rough around the edges, you do not get a smooth product >> until 2.0 or later. >> 3) We will not make non-backward compatible changes to interfaces we have >> declared stable. >> >> Pig is in use in production in multiple places, I do not think anyone will >> argue that it is not rough around the edges, and because we have users who >> run tens of thousands of Pig jobs daily non-backward compatible changes are >> impossible anyway. >> >> As for waiting for Hadoop to go 1.0, that is like waiting for Congress to >> fix social security. I am sure they will get there, but I may be retired >> first. In all seriousness, the Hadoop project has not been moving with >> speed or agility over the last few years, and I do not think waiting for >> them to do something is a good idea. Nor do I see it as necessary. Before >> we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we >> are bound more tightly to Hadoop then we are to log4j. But we are still our >> own project. 1.0 is a claim we are making about ourselves, not about the >> platform we run on. We should choose our release numbering in a way that >> sends a clear message to our users, and let those same users evaluate Hadoop >> separately. >> >> Also the argument that we should not go 1.0 because we are changing a lot >> of things is bogus. We are always changing a lot of things. If 1.0 means >> we will not make any major changes, then we will not get there until we go >> into some kinds of maintenance mode where we deem the majority of the work >> to have been done. I hope I have retired before we reach that state. >> >> My perspective on what 1.0 means obviously comes from a developer inside >> the project. I would be interested in hearing from users and anyone with a >> more marketing oriented perspective on what message 1.0 would send to >> (potential) pig users. >> >> Alan. >> >> On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote: >> >> I am worried that the new optimization plan work has not had a chance to >>> settle in, and we are releasing a brand new parser for the language in +
Corbin Hoenes 2011-03-04, 12:45
-
RE: [DISCUSSION] Pig.nextKaluskar, Sanjay 2011-03-04, 00:53
Alan,
Here's another perspective, based on the conventions used in most of the products I have worked on (okay, that's not a lot but some of them are well regarded by customers). Rather than focusing on the specific number, it is the transition which is important & tells users something. Let me explain - most products use a <major>.<minor>.<patch> style 3-number release numbering externally. Change of each of these numbers has some significance: - there should be complete (binary) backward compatibility for interfaces across patch releases, interoperability with other products; product change should be primarily bug fixes - there should be backward compatibility for interfaces across minor releases, there may be some interoperability changes (e.g., requiring a different version of one of the dependencies); product change is expected to contain new features - there can be substantial changes across major releases (architecture, interfaces, interoperability); interfaces (APIs, callback interfaces, etc.) are still expected to be source-level compatible (i.e., you may ask clients to recompile); in rare cases you can break interfaces and ask external code to be re-written/edited. By this definition, 0.7.0 was probably 1.0.0 (given that UDFs were forced to make code changes). -sanjay -----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: 04 March 2011 00:14 To: [EMAIL PROTECTED] Subject: Re: [DISCUSSION] Pig.next I agree that there will probably need to be several 0.9.x releases as the new optimization and parser work mature. As a consequence of this it may be longer between 0.9 and Pig.next then there has been between the last few releases. That only delays the question of what we call Pig.next, it does not answer it. To me, declaring 1.0 would mean the following things: 1) Pig is ready for production use, at least by the brave. 2) It is still rough around the edges, you do not get a smooth product until 2.0 or later. 3) We will not make non-backward compatible changes to interfaces we have declared stable. Pig is in use in production in multiple places, I do not think anyone will argue that it is not rough around the edges, and because we have users who run tens of thousands of Pig jobs daily non-backward compatible changes are impossible anyway. As for waiting for Hadoop to go 1.0, that is like waiting for Congress to fix social security. I am sure they will get there, but I may be retired first. In all seriousness, the Hadoop project has not been moving with speed or agility over the last few years, and I do not think waiting for them to do something is a good idea. Nor do I see it as necessary. Before we could go 1.0 would we insist that every jar we import is >= 1.0? Yes we are bound more tightly to Hadoop then we are to log4j. But we are still our own project. 1.0 is a claim we are making about ourselves, not about the platform we run on. We should choose our release numbering in a way that sends a clear message to our users, and let those same users evaluate Hadoop separately. Also the argument that we should not go 1.0 because we are changing a lot of things is bogus. We are always changing a lot of things. If 1.0 means we will not make any major changes, then we will not get there until we go into some kinds of maintenance mode where we deem the majority of the work to have been done. I hope I have retired before we reach that state. My perspective on what 1.0 means obviously comes from a developer inside the project. I would be interested in hearing from users and anyone with a more marketing oriented perspective on what message 1.0 would send to (potential) pig users. Alan. On Mar 2, 2011, at 6:31 PM, Dmitriy Ryaboy wrote: > I am worried that the new optimization plan work has not had a > chance to > settle in, and we are releasing a brand new parser for the language > in 0.9. > Those are pretty significant changes, if the idea behind calling > something a > "1.0" is stability, we may want to give them a release to mature a +
Kaluskar, Sanjay 2011-03-04, 00:53
-
Re: [DISCUSSION] Pig.nextJai Krishna 2011-03-03, 04:14
I tend to interpret Hadoop 0.21 and Pig 0.9 as "Hadoop has had 21 releases" and "Pig has had 9 releases" respectively.
In keeping with that, Pig version numbers that trail Hadoop seem logically consistent because Pig, in practice, primarily works off Hadoop (though it can do local mode, drive non Hadoop backends etc.). So, Hadoop at 0.21 and Pig at 0.10 seems right. Of course, I may be missing a lot of things here with regard to how Apache projects works. Thanks Jai On 3/3/11 6:22 AM, "Olga Natkovich" <[EMAIL PROTECTED]> wrote: Pig Users and Developers, We are starting to plan the work after Pig 0.9. One thing we need to decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0. I believe that we are ready to declare 1.0. Here are my reasons: (1) We are mature enough and produce good quality releases (2) Our interface no longer change in major ways (3) We have a growing user community and we want the newcomers to know that our releases are stable (4) If the next release is 0.10 and we decide that we should switch on the following release going from 0.10 to 1.0 will generate a lot of confusion. I wanted to start this conversation and see what others think before deciding if it is worth while to call a vote. Olga +
Jai Krishna 2011-03-03, 04:14
-
Re: [DISCUSSION] Pig.nextMridul Muralidharan 2011-03-04, 13:48
IMO 1.0 for a product typically promises : 1) Reasonable stability of interfaces. Typically only major version changes break interface compatibility. While we are at 0.x, it seems to be considered 'okish' to violate this : but once you are at 1.0 and higher, breaking interface contracts will not be desired behavior. We should be reasonably confident about the interfaces we expose to users : this includes the shell, exec envs, properties, api and spi's. (This also depends on hadoop btw). 2) Reasonable stability and code quality. Typically a major release promises reasonable rigor in terms of code quality, stability and functionality. As mentioned, it is easier to get amazon, etc to move to pig 1.0, but probably not so for 0.7 or 1.0.1 or 1.1, etc. Declaring something as 1.0 typically has this expectation. Considering the pretty invasive changes which has happened off late, maybe we do need to have a cool off period for the code to settle and focus on the bugs instead of features if we need a 1.0 release ? Though as a developer, we always want to work on new & exciting things, we should balance it against user expectations for a stable product. 3) reasonable 'polish' in the product. In general, it is not very easy to use pig - and it keeps violating principle of least surprise even after having used it for 3+ years now. Typically related to schema, parsing, changing udf contracts, property interactions, multi-query optimization effects, null handling/interactions and the like. A lot of it is probably just due to idioms and expectations which are not well known, bugs which should be filed, problems of trying to debug in a distributed cluster, constructs which are not adequately/well-defined, and lot due to mismatch between a novice user and pig-dev expectations. We tend to work around/avoid a lot of these issues without a second thought, but exposing to someone new does bring out the confusion. Considering the scope of pig, probably this is a pie in the sky goal - but it definitely would be good if pig "felt" stable and usable without need for too many caveats/gotchas. Until these are "reasonably" well tackled, imo, it is not a good idea to go 1.0. Regards, Mridul On Thursday 03 March 2011 06:22 AM, Olga Natkovich wrote: > Pig Users and Developers, > > We are starting to plan the work after Pig 0.9. One thing we need to decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0. > > I believe that we are ready to declare 1.0. Here are my reasons: > > (1) We are mature enough and produce good quality releases > (2) Our interface no longer change in major ways > (3) We have a growing user community and we want the newcomers to know that our releases are stable > (4) If the next release is 0.10 and we decide that we should switch on the following release going from 0.10 to 1.0 will generate a lot of confusion. > > I wanted to start this conversation and see what others think before deciding if it is worth while to call a vote. > > Olga +
Mridul Muralidharan 2011-03-04, 13:48
|