|
Arun C Murthy
2011-11-14, 22:11
Doug Cutting
2011-11-14, 22:41
Mattmann, Chris A
2011-11-14, 22:47
Todd Papaioannou
2011-11-14, 22:47
Milind.Bhandarkar@...
2011-11-14, 22:44
Mattmann, Chris A
2011-11-14, 22:46
Konstantin Boudnik
2011-11-14, 23:01
Sharad Agarwal
2011-11-15, 05:37
Mahadev Konar
2011-11-15, 06:23
Owen O'Malley
2011-11-15, 05:47
Andreas Neumann
2011-11-15, 05:56
Owen O'Malley
2011-11-15, 06:20
Dhruba Borthakur
2011-11-15, 06:07
Steve Loughran
2011-11-15, 09:57
Todd Lipcon
2011-11-15, 21:43
Owen O'Malley
2011-11-15, 22:17
Ted Dunning
2011-11-15, 22:25
Arun C Murthy
2011-11-15, 22:32
Luke Lu
2011-11-15, 22:40
Doug Cutting
2011-11-16, 01:37
Ahmed Radwan
2011-11-16, 01:41
Eli Collins
2011-11-16, 01:49
Doug Cutting
2011-11-16, 01:56
Eli Collins
2011-11-16, 02:03
Arun Murthy
2011-11-16, 04:20
Matt Foley
2011-11-16, 02:42
Konstantin Boudnik
2011-11-16, 02:47
Joe Stein
2011-11-16, 03:35
Konstantin Shvachko
2011-11-16, 07:26
Konstantin Shvachko
2011-11-16, 15:46
Scott Carey
2011-11-16, 07:06
Arun Murthy
2011-11-16, 04:14
Eli Collins
2011-11-16, 04:51
Joe Stein
2011-11-16, 05:37
Arun Murthy
2011-11-16, 05:53
Eli Collins
2011-11-16, 06:13
Arun Murthy
2011-11-16, 07:05
Konstantin Boudnik
2011-11-16, 02:06
Doug Cutting
2011-11-16, 17:15
Konstantin Boudnik
2011-11-16, 17:24
Scott Carey
2011-11-16, 18:15
Doug Cutting
2011-11-16, 19:57
Matt Foley
2011-11-16, 21:11
Owen O'Malley
2011-11-16, 21:37
Joe Stein
2011-11-16, 21:53
Roman Shaposhnik
2011-11-16, 23:05
Andrew Purtell
2011-11-16, 23:40
Arun C Murthy
2011-11-17, 00:03
Eric Yang
2011-11-17, 00:05
Konstantin Boudnik
2011-11-17, 05:54
Arun C Murthy
2011-11-16, 22:43
Doug Cutting
2011-11-16, 23:02
Arun C Murthy
2011-11-16, 23:05
Arun C Murthy
2011-11-16, 23:13
sanjay Radia
2011-11-17, 01:11
Nathan Roberts
2011-11-16, 23:51
Doug Cutting
2011-11-17, 00:13
Scott Carey
2011-11-17, 01:37
Scott Carey
2011-11-17, 02:06
Steve Loughran
2011-11-17, 10:45
Roman Shaposhnik
2011-11-17, 16:33
Arun C Murthy
2011-11-17, 19:09
Roman Shaposhnik
2011-11-17, 19:31
Steve Loughran
2011-11-21, 11:17
Andrew Purtell
2011-11-17, 21:07
Mahadev Konar
2011-11-17, 21:12
Andrew Purtell
2011-11-17, 21:28
Alejandro Abdelnur
2011-11-17, 19:17
|
-
[DISCUSS] Apache Hadoop 1.0?Arun C Murthy 2011-11-14, 22:11
Folks,
Apache Hadoop has come a long way since our humble beginnings. As a community we've made significant progress, even in 2011 - we've had 3 releases off the branch-0.20-security (0.20.205 being the latest) and we just released 0.23.0 last week, our first major release off trunk in a while. With hadoop-0.20.205 we finally have an Apache release with both security and HBase support, both critical for the growing ecosystem. With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has something we've wanted for a while and I think it's time for us to just ship it. Linus did something similar with GNU/Linux 3.0. Yes, we could add more features or better it along many dimensions (ala hadoop-0.23), but right now we have a pretty decent piece of software i.e. the feature set in hadoop-0.20.205 is compelling and widely used. We could call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can support compatibility in the hadoop-1.x series, which is the essential ingredient. This isn't a brand new idea, Doug suggested this a long while ago. Thoughts? thanks, Arun +
Arun C Murthy 2011-11-14, 22:11
-
Re: [DISCUSS] Apache Hadoop 1.0?Doug Cutting 2011-11-14, 22:41
To be specific, I think one of the possible could be sensible:
A. Rename as follows: 0.20 -> 1.0 0.21 -> 1.1 0.22 -> 1.2 0.23 -> 2.0 0.24 -> 2.1 B. Just drop the leading zero, e.g., 0.23.0 becomes 23.0. Doug On 11/14/2011 02:11 PM, Arun C Murthy wrote: > Folks, > > Apache Hadoop has come a long way since our humble beginnings. As a community we've made significant progress, even in 2011 - we've had 3 releases off the branch-0.20-security (0.20.205 being the latest) and we just released 0.23.0 last week, our first major release off trunk in a while. > > With hadoop-0.20.205 we finally have an Apache release with both security and HBase support, both critical for the growing ecosystem. > > With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has something we've wanted for a while and I think it's time for us to just ship it. Linus did something similar with GNU/Linux 3.0. > > Yes, we could add more features or better it along many dimensions (ala hadoop-0.23), but right now we have a pretty decent piece of software i.e. the feature set in hadoop-0.20.205 is compelling and widely used. We could call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can support compatibility in the hadoop-1.x series, which is the essential ingredient. This isn't a brand new idea, Doug suggested this a long while ago. > > Thoughts? > > thanks, > Arun > > > +
Doug Cutting 2011-11-14, 22:41
-
Re: [DISCUSS] Apache Hadoop 1.0?Mattmann, Chris A 2011-11-14, 22:47
On Nov 14, 2011, at 2:41 PM, Doug Cutting wrote:
> To be specific, I think one of the possible could be sensible: > > A. Rename as follows: > > 0.20 -> 1.0 > 0.21 -> 1.1 > 0.22 -> 1.2 > 0.23 -> 2.0 > 0.24 -> 2.1 I like this one, Doug. +1. Cheers, Chris > > B. Just drop the leading zero, e.g., 0.23.0 becomes 23.0. > > Doug > > On 11/14/2011 02:11 PM, Arun C Murthy wrote: >> Folks, >> >> Apache Hadoop has come a long way since our humble beginnings. As a community we've made significant progress, even in 2011 - we've had 3 releases off the branch-0.20-security (0.20.205 being the latest) and we just released 0.23.0 last week, our first major release off trunk in a while. >> >> With hadoop-0.20.205 we finally have an Apache release with both security and HBase support, both critical for the growing ecosystem. >> >> With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has something we've wanted for a while and I think it's time for us to just ship it. Linus did something similar with GNU/Linux 3.0. >> >> Yes, we could add more features or better it along many dimensions (ala hadoop-0.23), but right now we have a pretty decent piece of software i.e. the feature set in hadoop-0.20.205 is compelling and widely used. We could call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can support compatibility in the hadoop-1.x series, which is the essential ingredient. This isn't a brand new idea, Doug suggested this a long while ago. >> >> Thoughts? >> >> thanks, >> Arun >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2011-11-14, 22:47
-
Re: [DISCUSS] Apache Hadoop 1.0?Todd Papaioannou 2011-11-14, 22:47
A) is MUCH better from a product branding stand point.. which is what this is mostly about. I would go for something along those lines.
ToddP On Nov 14, 2011, at 2:41 PM, Doug Cutting wrote: > To be specific, I think one of the possible could be sensible: > > A. Rename as follows: > > 0.20 -> 1.0 > 0.21 -> 1.1 > 0.22 -> 1.2 > 0.23 -> 2.0 > 0.24 -> 2.1 > > B. Just drop the leading zero, e.g., 0.23.0 becomes 23.0. > > Doug > > On 11/14/2011 02:11 PM, Arun C Murthy wrote: >> Folks, >> >> Apache Hadoop has come a long way since our humble beginnings. As a community we've made significant progress, even in 2011 - we've had 3 releases off the branch-0.20-security (0.20.205 being the latest) and we just released 0.23.0 last week, our first major release off trunk in a while. >> >> With hadoop-0.20.205 we finally have an Apache release with both security and HBase support, both critical for the growing ecosystem. >> >> With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has something we've wanted for a while and I think it's time for us to just ship it. Linus did something similar with GNU/Linux 3.0. >> >> Yes, we could add more features or better it along many dimensions (ala hadoop-0.23), but right now we have a pretty decent piece of software i.e. the feature set in hadoop-0.20.205 is compelling and widely used. We could call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can support compatibility in the hadoop-1.x series, which is the essential ingredient. This isn't a brand new idea, Doug suggested this a long while ago. >> >> Thoughts? >> >> thanks, >> Arun >> >> >> +
Todd Papaioannou 2011-11-14, 22:47
-
Re: [DISCUSS] Apache Hadoop 1.0?Milind.Bhandarkar@... 2011-11-14, 22:44
Arun,
You beat me to start this discussion :-) I was at Apachecon recently, and based on the questions and comments from several attendees for the hadoop sessions, as well as the hadoop meetup afterwards, it was clear that users are perplexed about our versioning strategies. In addition, Doug and Owen also have publicly stated (in #hw2011 and #apachecon11 respectively) that 0.20.2xx should be considered a 1.0. There is a perception (no doubt caused by 0.19 and 0.21 *abandonment*) that releases ending in odd numbers are unstable releases. So, some users were confused when some speakers urged folks to try out 0.23. I second your proposal that 0.20.2xx should be called 1.x. Based on some encouraging results reported on 0.22, I propose that it should be called 2.0. Which makes 0.23 as the 3.0. So, +1! - milind On 11/14/11 2:11 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: >Folks, > >Apache Hadoop has come a long way since our humble beginnings. As a >community we've made significant progress, even in 2011 - we've had 3 >releases off the branch-0.20-security (0.20.205 being the latest) and we >just released 0.23.0 last week, our first major release off trunk in a >while. > >With hadoop-0.20.205 we finally have an Apache release with both security >and HBase support, both critical for the growing ecosystem. > >With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker >has something we've wanted for a while and I think it's time for us to >just ship it. Linus did something similar with GNU/Linux 3.0. > >Yes, we could add more features or better it along many dimensions (ala >hadoop-0.23), but right now we have a pretty decent piece of software >i.e. the feature set in hadoop-0.20.205 is compelling and widely used. >We could call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a >community, can support compatibility in the hadoop-1.x series, which is >the essential ingredient. This isn't a brand new idea, Doug suggested >this a long while ago. > >Thoughts? > >thanks, >Arun > > > > +
Milind.Bhandarkar@... 2011-11-14, 22:44
-
Re: [DISCUSS] Apache Hadoop 1.0?Mattmann, Chris A 2011-11-14, 22:46
Hey Guys,
My super +1 for calling one 0.20.205 as 1.0. 1 point oh! Cheers, Chris On Nov 14, 2011, at 2:11 PM, Arun C Murthy wrote: > Folks, > > Apache Hadoop has come a long way since our humble beginnings. As a community we've made significant progress, even in 2011 - we've had 3 releases off the branch-0.20-security (0.20.205 being the latest) and we just released 0.23.0 last week, our first major release off trunk in a while. > > With hadoop-0.20.205 we finally have an Apache release with both security and HBase support, both critical for the growing ecosystem. > > With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has something we've wanted for a while and I think it's time for us to just ship it. Linus did something similar with GNU/Linux 3.0. > > Yes, we could add more features or better it along many dimensions (ala hadoop-0.23), but right now we have a pretty decent piece of software i.e. the feature set in hadoop-0.20.205 is compelling and widely used. We could call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can support compatibility in the hadoop-1.x series, which is the essential ingredient. This isn't a brand new idea, Doug suggested this a long while ago. > > Thoughts? > > thanks, > Arun > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2011-11-14, 22:46
-
Re: [DISCUSS] Apache Hadoop 1.0?Konstantin Boudnik 2011-11-14, 23:01
+1 on graduating .205 as 1.0. It is a very mature and widely used
version of Hadoop and really has a significant bang for a buck! It seems that making 0.22 to be 2.0 has a lot of sense because its coming release carries a number of significant changes qualifying it to be a major release. .23 seems to be a good candidate for 3.0 for exactly the same reasons with MR2 framework and all. Seems like a great time for the move! Cos On Mon, Nov 14, 2011 at 02:11PM, Arun C Murthy wrote: > Folks, > > Apache Hadoop has come a long way since our humble beginnings. As a > community we've made significant progress, even in 2011 - we've had 3 > releases off the branch-0.20-security (0.20.205 being the latest) and we > just released 0.23.0 last week, our first major release off trunk in a > while. > > With hadoop-0.20.205 we finally have an Apache release with both security > and HBase support, both critical for the growing ecosystem. > > With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has > something we've wanted for a while and I think it's time for us to just ship > it. Linus did something similar with GNU/Linux 3.0. > > Yes, we could add more features or better it along many dimensions (ala > hadoop-0.23), but right now we have a pretty decent piece of software i.e. > the feature set in hadoop-0.20.205 is compelling and widely used. We could > call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can > support compatibility in the hadoop-1.x series, which is the essential > ingredient. This isn't a brand new idea, Doug suggested this a long while > ago. > > Thoughts? > > thanks, > Arun > > > +
Konstantin Boudnik 2011-11-14, 23:01
-
Re: [DISCUSS] Apache Hadoop 1.0?Sharad Agarwal 2011-11-15, 05:37
+1
remembering and understanding current release numbering and attributing it to stable/compatible etc. is really painful. On Tue, Nov 15, 2011 at 3:41 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Folks, > > Apache Hadoop has come a long way since our humble beginnings. As a > community we've made significant progress, even in 2011 - we've had 3 > releases off the branch-0.20-security (0.20.205 being the latest) and we > just released 0.23.0 last week, our first major release off trunk in a > while. > > With hadoop-0.20.205 we finally have an Apache release with both security > and HBase support, both critical for the growing ecosystem. > > With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has > something we've wanted for a while and I think it's time for us to just > ship it. Linus did something similar with GNU/Linux 3.0. > > Yes, we could add more features or better it along many dimensions (ala > hadoop-0.23), but right now we have a pretty decent piece of software i.e. > the feature set in hadoop-0.20.205 is compelling and widely used. We could > call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can > support compatibility in the hadoop-1.x series, which is the essential > ingredient. This isn't a brand new idea, Doug suggested this a long while > ago. > > Thoughts? > > thanks, > Arun > > > > +
Sharad Agarwal 2011-11-15, 05:37
-
Re: [DISCUSS] Apache Hadoop 1.0?Mahadev Konar 2011-11-15, 06:23
+1 for 0.20.2xx as 1.0.
mahadev On Mon, Nov 14, 2011 at 9:37 PM, Sharad Agarwal <[EMAIL PROTECTED]> wrote: > +1 > > remembering and understanding current release numbering and attributing it > to stable/compatible etc. is really painful. > > > On Tue, Nov 15, 2011 at 3:41 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > >> Folks, >> >> Apache Hadoop has come a long way since our humble beginnings. As a >> community we've made significant progress, even in 2011 - we've had 3 >> releases off the branch-0.20-security (0.20.205 being the latest) and we >> just released 0.23.0 last week, our first major release off trunk in a >> while. >> >> With hadoop-0.20.205 we finally have an Apache release with both security >> and HBase support, both critical for the growing ecosystem. >> >> With that, I think it's time to call it as hadoop-1.0. The 1.0 moniker has >> something we've wanted for a while and I think it's time for us to just >> ship it. Linus did something similar with GNU/Linux 3.0. >> >> Yes, we could add more features or better it along many dimensions (ala >> hadoop-0.23), but right now we have a pretty decent piece of software i.e. >> the feature set in hadoop-0.20.205 is compelling and widely used. We could >> call hadoop-0.23 (or 0.22) as 2.0 etc. I do think we, as a community, can >> support compatibility in the hadoop-1.x series, which is the essential >> ingredient. This isn't a brand new idea, Doug suggested this a long while >> ago. >> >> Thoughts? >> >> thanks, >> Arun >> >> >> >> > +
Mahadev Konar 2011-11-15, 06:23
-
Re: [DISCUSS] Apache Hadoop 1.0?Owen O'Malley 2011-11-15, 05:47
I think this is great. Thanks, Arun.
Since the 2xx line is clearly a major branch, we should designate it as 1.0. I don't think there is any need to rename current releases, so let's just rename the upcoming ones: 0.20.205.1 -> 1.0.0 0.20.206.0 -> 1.1.0 0.21 is dead and we should just leave it as it as 0.21. If we want to leave space for a 0.22 release, it should be 2.0.0: 0.22.0 -> 2.0.0 And that would make the 0.23.x releases 3.x.y. 0.23.0 -> 3.0.0 -- Owen +
Owen O'Malley 2011-11-15, 05:47
-
Re: [DISCUSS] Apache Hadoop 1.0?Andreas Neumann 2011-11-15, 05:56
+1 for not renaming past releases, that would really start confusion.
If .20.20x.y corresponds to 1.z.y, then z=x-5 and: 0.20.205.1 -> 1.0.1 0.20.206.0 -> 1.1.0 -Andreas. On 11/14/11 9:47 PM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: >I think this is great. Thanks, Arun. > >Since the 2xx line is clearly a major branch, we should designate it as >1.0. I don't think there is any need to rename current releases, so let's >just rename the upcoming ones: > >0.20.205.1 -> 1.0.0 >0.20.206.0 -> 1.1.0 > >0.21 is dead and we should just leave it as it as 0.21. > >If we want to leave space for a 0.22 release, it should be 2.0.0: > >0.22.0 -> 2.0.0 > >And that would make the 0.23.x releases 3.x.y. > >0.23.0 -> 3.0.0 > >-- Owen +
Andreas Neumann 2011-11-15, 05:56
-
Re: [DISCUSS] Apache Hadoop 1.0?Owen O'Malley 2011-11-15, 06:20
On Mon, Nov 14, 2011 at 9:56 PM, Andreas Neumann <[EMAIL PROTECTED]> wrote:
> +1 for not renaming past releases, that would really start confusion. > > If .20.20x.y corresponds to 1.z.y, then z=x-5 and: > > 0.20.205.1 -> 1.0.1 > 0.20.206.0 -> 1.1.0 > Eli had said previously (and I agree) that 205.1 seems overly aggressive for a point release and would be better named as a minor release. But 0.20.205.1 going to either 1.0.0 or 1.0.1 is much much better than the current state. -- Owen +
Owen O'Malley 2011-11-15, 06:20
-
Re: [DISCUSS] Apache Hadoop 1.0?Dhruba Borthakur 2011-11-15, 06:07
+1 to making the upcoming 0.23 release as 2.0.
-dhruba On Mon, Nov 14, 2011 at 9:47 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > I think this is great. Thanks, Arun. > > Since the 2xx line is clearly a major branch, we should designate it as > 1.0. I don't think there is any need to rename current releases, so let's > just rename the upcoming ones: > > 0.20.205.1 -> 1.0.0 > 0.20.206.0 -> 1.1.0 > > 0.21 is dead and we should just leave it as it as 0.21. > > If we want to leave space for a 0.22 release, it should be 2.0.0: > > 0.22.0 -> 2.0.0 > > And that would make the 0.23.x releases 3.x.y. > > 0.23.0 -> 3.0.0 > > -- Owen > -- Subscribe to my posts at http://www.facebook.com/dhruba +
Dhruba Borthakur 2011-11-15, 06:07
-
Re: [DISCUSS] Apache Hadoop 1.0?Steve Loughran 2011-11-15, 09:57
On 15/11/11 06:07, Dhruba Borthakur wrote:
> +1 to making the upcoming 0.23 release as 2.0. > +1 And leave the 0.20.20x chain as is, just because people are used to it +
Steve Loughran 2011-11-15, 09:57
-
Re: [DISCUSS] Apache Hadoop 1.0?Todd Lipcon 2011-11-15, 21:43
On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> On 15/11/11 06:07, Dhruba Borthakur wrote: >> >> +1 to making the upcoming 0.23 release as 2.0. >> > > +1 > > And leave the 0.20.20x chain as is, just because people are used to it > +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point. Though it's weird to never have a 1.0, the "0.20" name is well ingrained, and I think renaming it at this point will cause a lot of confusion (plus cause problems for downstream projects like Hive and HBase which use regexes against the version string in various shim layers) -Todd -- Todd Lipcon Software Engineer, Cloudera +
Todd Lipcon 2011-11-15, 21:43
-
Re: [DISCUSS] Apache Hadoop 1.0?Owen O'Malley 2011-11-15, 22:17
On Tue, Nov 15, 2011 at 1:43 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: > > On 15/11/11 06:07, Dhruba Borthakur wrote: > >> > >> +1 to making the upcoming 0.23 release as 2.0. > >> > > > > +1 > > > > And leave the 0.20.20x chain as is, just because people are used to it > > > > +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point. > I really don't see it that way. I'm continuing (up to and including last week) to have to explain the version numbering for 0.20, 0.20.2xx, 0.21, 0.22, and 0.23. Obviously the people who are willing to do the work don't feel that it is a waste of time or they wouldn't be signing up to do the work. -- Owen +
Owen O'Malley 2011-11-15, 22:17
-
Re: [DISCUSS] Apache Hadoop 1.0?Ted Dunning 2011-11-15, 22:25
On Tue, Nov 15, 2011 at 2:17 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
> On Tue, Nov 15, 2011 at 1:43 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > > > On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran <[EMAIL PROTECTED]> > wrote: > > > On 15/11/11 06:07, Dhruba Borthakur wrote: > > >> > > >> +1 to making the upcoming 0.23 release as 2.0. > > >> > > > > > > +1 > > > > > > And leave the 0.20.20x chain as is, just because people are used to it > > > > > > > +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point. > > > > I really don't see it that way. I'm continuing (up to and including last > week) to have to explain the version numbering for 0.20, 0.20.2xx, 0.21, > 0.22, and 0.23. Obviously the people who are willing to do the work don't > feel that it is a waste of time or they wouldn't be signing up to do the > work. This smells like Java 1.4 versus Java 6 all over again. Explaining why 0.20 became 1.0 when 0.21 didn't become anything is a pretty strange exercise. If a marketing person somewhere is totally set on making 0.23 be 2.0, just do that and be done. There doesn't have to be a 1.0 version. +
Ted Dunning 2011-11-15, 22:25
-
Re: [DISCUSS] Apache Hadoop 1.0?Arun C Murthy 2011-11-15, 22:32
I don't see this as 'renaming', I propose we just look forward and make the next release from branch-0.20-security as 1.0 to keep things simple.
IMHO, going back to rename existing releases (0.21 etc.) isn't productive. Arun On Nov 15, 2011, at 1:43 PM, Todd Lipcon wrote: > On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: >> On 15/11/11 06:07, Dhruba Borthakur wrote: >>> >>> +1 to making the upcoming 0.23 release as 2.0. >>> >> >> +1 >> >> And leave the 0.20.20x chain as is, just because people are used to it >> > > +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point. > Though it's weird to never have a 1.0, the "0.20" name is well > ingrained, and I think renaming it at this point will cause a lot of > confusion (plus cause problems for downstream projects like Hive and > HBase which use regexes against the version string in various shim > layers) > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera +
Arun C Murthy 2011-11-15, 22:32
-
Re: [DISCUSS] Apache Hadoop 1.0?Luke Lu 2011-11-15, 22:40
+1 on *new* releases from 0.20.2xx branches as 1.x; 0.22 branch as 2.x
and 0.23/24 branches as 3.x. On Tue, Nov 15, 2011 at 2:32 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > I don't see this as 'renaming', I propose we just look forward and make the next release from branch-0.20-security as 1.0 to keep things simple. > > IMHO, going back to rename existing releases (0.21 etc.) isn't productive. > > Arun > > On Nov 15, 2011, at 1:43 PM, Todd Lipcon wrote: > >> On Tue, Nov 15, 2011 at 1:57 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: >>> On 15/11/11 06:07, Dhruba Borthakur wrote: >>>> >>>> +1 to making the upcoming 0.23 release as 2.0. >>>> >>> >>> +1 >>> >>> And leave the 0.20.20x chain as is, just because people are used to it >>> >> >> +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point. >> Though it's weird to never have a 1.0, the "0.20" name is well >> ingrained, and I think renaming it at this point will cause a lot of >> confusion (plus cause problems for downstream projects like Hive and >> HBase which use regexes against the version string in various shim >> layers) >> >> -Todd >> -- >> Todd Lipcon >> Software Engineer, Cloudera > > +
Luke Lu 2011-11-15, 22:40
-
Re: [DISCUSS] Apache Hadoop 1.0?Doug Cutting 2011-11-16, 01:37
On 11/15/2011 01:43 PM, Todd Lipcon wrote:
> +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point. Everyone seems to agree that we should rename 0.23 to either 2.0 or 3.0. There are a number of different views about what to do with 0.20, 0.21 and 0.22. So maybe we should proceed where there's consensus and not argue extensively where there's disagreement? Since 0.23 has little install base yet it should be easy to rename. If we're not going to rename 0.20, 0.21 or 0.22 releases then 3.0 seems inappropriate. Can we agree to 0.23 -> 2.0? That's consistent with the MR2 nomenclature. Doug +
Doug Cutting 2011-11-16, 01:37
-
Re: [DISCUSS] Apache Hadoop 1.0?Ahmed Radwan 2011-11-16, 01:41
+1
> Can we agree to 0.23 -> 2.0? That's consistent with the MR2 nomenclature. Best Regards Ahmed On Tue, Nov 15, 2011 at 5:37 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > On 11/15/2011 01:43 PM, Todd Lipcon wrote: >> +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point. > > Everyone seems to agree that we should rename 0.23 to either 2.0 or 3.0. > There are a number of different views about what to do with 0.20, 0.21 > and 0.22. So maybe we should proceed where there's consensus and not > argue extensively where there's disagreement? > > Since 0.23 has little install base yet it should be easy to rename. If > we're not going to rename 0.20, 0.21 or 0.22 releases then 3.0 seems > inappropriate. > > Can we agree to 0.23 -> 2.0? That's consistent with the MR2 nomenclature. > > Doug > +
Ahmed Radwan 2011-11-16, 01:41
-
Re: [DISCUSS] Apache Hadoop 1.0?Eli Collins 2011-11-16, 01:49
On Tue, Nov 15, 2011 at 5:37 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> On 11/15/2011 01:43 PM, Todd Lipcon wrote: >> +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point. > > Everyone seems to agree that we should rename 0.23 to either 2.0 or 3.0. > There are a number of different views about what to do with 0.20, 0.21 > and 0.22. So maybe we should proceed where there's consensus and not > argue extensively where there's disagreement? > > Since 0.23 has little install base yet it should be easy to rename. If > we're not going to rename 0.20, 0.21 or 0.22 releases then 3.0 seems > inappropriate. > > Can we agree to 0.23 -> 2.0? That's consistent with the MR2 nomenclature. > Are you suggesting a two part version scheme? Ie 0.23.0 -> 2.0 0.23.1 -> 2.1 I'm +1 to that. fwiw I'd map 0.20.200.0 to 1.0, 203.0 would be 1.3, 205.0, would be 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename 22 either since it both has features that are in 20x, and 20x has features not in 22, and is not yet released or stable. Seems hard to come up with a reasonable version number for it. Thanks, Eli +
Eli Collins 2011-11-16, 01:49
-
Re: [DISCUSS] Apache Hadoop 1.0?Doug Cutting 2011-11-16, 01:56
On 11/15/2011 05:49 PM, Eli Collins wrote:
> Are you suggesting a two part version scheme? Ie > > 0.23.0 -> 2.0 > 0.23.1 -> 2.1 I didn't specify. We could either do that or: 0.23.0 -> 2.0.0 0.23.1 -> 2.0.1 ... 0.24.0 -> 2.1.0 ... I don't care which much. Do you? > fwiw I'd map 0.20.200.0 to 1.0, 203.0 would be 1.3, 205.0, would be > 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename > 22 either since it both has features that are in 20x, and 20x has > features not in 22, and is not yet released or stable. Seems hard to > come up with a reasonable version number for it. This is about the fourth or fifth different proposal around these. I'm not sure things are congealing around a consensus. I don't want to stand in the way of that, but I think we might first settle the part that we're nearer consensus on. Doug +
Doug Cutting 2011-11-16, 01:56
-
Re: [DISCUSS] Apache Hadoop 1.0?Eli Collins 2011-11-16, 02:03
On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> On 11/15/2011 05:49 PM, Eli Collins wrote: >> Are you suggesting a two part version scheme? Ie >> >> 0.23.0 -> 2.0 >> 0.23.1 -> 2.1 > > I didn't specify. We could either do that or: > > 0.23.0 -> 2.0.0 > 0.23.1 -> 2.0.1 > ... > 0.24.0 -> 2.1.0 > ... > > I don't care which much. Do you? > Nope. Sticking with the three part scheme seems reasonable since we'll eventually do sustaining releases of 23. +1 to your scheme above. Thanks, Eli +
Eli Collins 2011-11-16, 02:03
-
Re: [DISCUSS] Apache Hadoop 1.0?Arun Murthy 2011-11-16, 04:20
On Nov 15, 2011, at 6:03 PM, Eli Collins <[EMAIL PROTECTED]> wrote:
> On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> On 11/15/2011 05:49 PM, Eli Collins wrote: >>> Are you suggesting a two part version scheme? Ie >>> >>> 0.23.0 -> 2.0 >>> 0.23.1 -> 2.1 >> >> I didn't specify. We could either do that or: >> >> 0.23.0 -> 2.0.0 >> 0.23.1 -> 2.0.1 >> ... >> 0.24.0 -> 2.1.0 >> ... >> >> I don't care which much. Do you? >> > > Nope. Sticking with the three part scheme seems reasonable since we'll > eventually do sustaining releases of 23. > > +1 to your scheme above. +1 for three part scheme. Arun > > Thanks, > Eli +
Arun Murthy 2011-11-16, 04:20
-
Re: [DISCUSS] Apache Hadoop 1.0?Matt Foley 2011-11-16, 02:42
I agree with some prior posters that renaming the 0.20-security sustaining
branch could be confusing. How about the following (pseudo-code)? ## Just before we are ready to make rc0 for release 0.20.205.1, do: svn copy branch-0.20-security-205 branch-1.0 ## and actually release it from branch-1.0 as release 1.0.0 ## Then, after the 1.0.0 release vote ends successfully, do: svn copy branch-0.20-security branch-1.1 ## This will pick up the remaining changes done to date, which would ## have gone into 0.20.206.0, and will instead go into release 1.1.0, ## sometime in the future ## However, since branch-0.23 was just recently split from trunk, it should be ## upgraded to 2.0 in the usual way, with a rename: svn mv branch-0.23 branch-2.0 ## and also rename the actual release: svn mv tags/release-0.23.0 tags/release-2.0.0 ## The work currently going into the future 0.23.1 will become 2.0.1, not 2.1.0. ## Work going into trunk will become 2.1 or higher in the future. This is a concrete, actionable proposal. In an effort to establish consensus, would it be appropriate to call a vote on it? --Matt On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > On 11/15/2011 05:49 PM, Eli Collins wrote: > > Are you suggesting a two part version scheme? Ie > > > > 0.23.0 -> 2.0 > > 0.23.1 -> 2.1 > > I didn't specify. We could either do that or: > > 0.23.0 -> 2.0.0 > 0.23.1 -> 2.0.1 > ... > 0.24.0 -> 2.1.0 > ... > > I don't care which much. Do you? > > > fwiw I'd map 0.20.200.0 to 1.0, 203.0 would be 1.3, 205.0, would be > > 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename > > 22 either since it both has features that are in 20x, and 20x has > > features not in 22, and is not yet released or stable. Seems hard to > > come up with a reasonable version number for it. > > This is about the fourth or fifth different proposal around these. I'm > not sure things are congealing around a consensus. I don't want to > stand in the way of that, but I think we might first settle the part > that we're nearer consensus on. > > Doug > +
Matt Foley 2011-11-16, 02:42
-
Re: [DISCUSS] Apache Hadoop 1.0?Konstantin Boudnik 2011-11-16, 02:47
And once again - 0.22 seems to be forgotten for an unexplained reason.
I urge to stick to original Arun's proposal and use 0.22 as 2.0 With the correction I like the following proposal. Cos On Tue, Nov 15, 2011 at 06:42PM, Matt Foley wrote: > I agree with some prior posters that renaming the 0.20-security sustaining > branch could be confusing. > How about the following (pseudo-code)? > > ## Just before we are ready to make rc0 for release 0.20.205.1, do: > svn copy branch-0.20-security-205 branch-1.0 > ## and actually release it from branch-1.0 as release 1.0.0 > > ## Then, after the 1.0.0 release vote ends successfully, do: > svn copy branch-0.20-security branch-1.1 > ## This will pick up the remaining changes done to date, which would > ## have gone into 0.20.206.0, and will instead go into release 1.1.0, > ## sometime in the future > > ## However, since branch-0.23 was just recently split from trunk, it should > be > ## upgraded to 2.0 in the usual way, with a rename: > svn mv branch-0.23 branch-2.0 > ## and also rename the actual release: > svn mv tags/release-0.23.0 tags/release-2.0.0 > ## The work currently going into the future 0.23.1 will become 2.0.1, not > 2.1.0. > ## Work going into trunk will become 2.1 or higher in the future. > > This is a concrete, actionable proposal. In an effort to establish > consensus, would it be appropriate to call a vote on it? > --Matt > > > On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > > > On 11/15/2011 05:49 PM, Eli Collins wrote: > > > Are you suggesting a two part version scheme? Ie > > > > > > 0.23.0 -> 2.0 > > > 0.23.1 -> 2.1 > > > > I didn't specify. We could either do that or: > > > > 0.23.0 -> 2.0.0 > > 0.23.1 -> 2.0.1 > > ... > > 0.24.0 -> 2.1.0 > > ... > > > > I don't care which much. Do you? > > > > > fwiw I'd map 0.20.200.0 to 1.0, 203.0 would be 1.3, 205.0, would be > > > 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename > > > 22 either since it both has features that are in 20x, and 20x has > > > features not in 22, and is not yet released or stable. Seems hard to > > > come up with a reasonable version number for it. > > > > This is about the fourth or fifth different proposal around these. I'm > > not sure things are congealing around a consensus. I don't want to > > stand in the way of that, but I think we might first settle the part > > that we're nearer consensus on. > > > > Doug > > +
Konstantin Boudnik 2011-11-16, 02:47
-
Re: [DISCUSS] Apache Hadoop 1.0?Joe Stein 2011-11-16, 03:35
Consistency between supported branches and releases from trunk in some logical order would be helpful for those outside of the community coming in, labeled however works best for the active community. My 0.235689 cents.
/* Joe Stein http://www.medialets.com Twitter: @allthingshadoop */ On Nov 15, 2011, at 9:47 PM, Konstantin Boudnik <[EMAIL PROTECTED]> wrote: > And once again - 0.22 seems to be forgotten for an unexplained reason. > > I urge to stick to original Arun's proposal and use 0.22 as 2.0 > With the correction I like the following proposal. > > Cos > > On Tue, Nov 15, 2011 at 06:42PM, Matt Foley wrote: >> I agree with some prior posters that renaming the 0.20-security sustaining >> branch could be confusing. >> How about the following (pseudo-code)? >> >> ## Just before we are ready to make rc0 for release 0.20.205.1, do: >> svn copy branch-0.20-security-205 branch-1.0 >> ## and actually release it from branch-1.0 as release 1.0.0 >> >> ## Then, after the 1.0.0 release vote ends successfully, do: >> svn copy branch-0.20-security branch-1.1 >> ## This will pick up the remaining changes done to date, which would >> ## have gone into 0.20.206.0, and will instead go into release 1.1.0, >> ## sometime in the future >> >> ## However, since branch-0.23 was just recently split from trunk, it should >> be >> ## upgraded to 2.0 in the usual way, with a rename: >> svn mv branch-0.23 branch-2.0 >> ## and also rename the actual release: >> svn mv tags/release-0.23.0 tags/release-2.0.0 >> ## The work currently going into the future 0.23.1 will become 2.0.1, not >> 2.1.0. >> ## Work going into trunk will become 2.1 or higher in the future. >> >> This is a concrete, actionable proposal. In an effort to establish >> consensus, would it be appropriate to call a vote on it? >> --Matt >> >> >> On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> >>> On 11/15/2011 05:49 PM, Eli Collins wrote: >>>> Are you suggesting a two part version scheme? Ie >>>> >>>> 0.23.0 -> 2.0 >>>> 0.23.1 -> 2.1 >>> >>> I didn't specify. We could either do that or: >>> >>> 0.23.0 -> 2.0.0 >>> 0.23.1 -> 2.0.1 >>> ... >>> 0.24.0 -> 2.1.0 >>> ... >>> >>> I don't care which much. Do you? >>> >>>> fwiw I'd map 0.20.200.0 to 1.0, 203.0 would be 1.3, 205.0, would be >>>> 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename >>>> 22 either since it both has features that are in 20x, and 20x has >>>> features not in 22, and is not yet released or stable. Seems hard to >>>> come up with a reasonable version number for it. >>> >>> This is about the fourth or fifth different proposal around these. I'm >>> not sure things are congealing around a consensus. I don't want to >>> stand in the way of that, but I think we might first settle the part >>> that we're nearer consensus on. >>> >>> Doug >>> +
Joe Stein 2011-11-16, 03:35
-
Re: [DISCUSS] Apache Hadoop 1.0?Konstantin Shvachko 2011-11-16, 07:26
Consistency of naming the releases is a very valid point and should be
the main concern in the decision making. If 0.20.205 is called Hadoop 1, and 0.23 called Hadoop 2, then releasing 0.22 under 0.22 will be confusing. If we vote only on renaming 0.20.205 to 1.0 then the 0.23 release becomes confusing, as well as the upcoming 0.22 release. I think there are values in all three branches. I also think the three have substantial differences so treating them as separate bases makes sense to me. Presumably they will evolve more or less independently for some time at least. So I'd support the proposition (I think it was Doug's) to 1. Call the next release off 0.20.security branch as 1.0.0 2. Call the next release off 0.22 branch as 2.0.0 3. Call the next release off 0.23 branch as 3.0.0 We do not need to decide if 0.20.206 will be 1.1.0 or 1.0.1. It should be decided when the subsequent release of 1.0.0 is voted in based on the amount of changes introduced. Since 0.23 has just been released a rename of 0.23 to 3.0.0 would work for me as well. Thanks, --Konstantin On Tue, Nov 15, 2011 at 7:35 PM, Joe Stein <[EMAIL PROTECTED]> wrote: > Consistency between supported branches and releases from trunk in some logical order would be helpful for those outside of the community coming in, labeled however works best for the active community. My 0.235689 cents. > > /* > Joe Stein > http://www.medialets.com > Twitter: @allthingshadoop > */ > > On Nov 15, 2011, at 9:47 PM, Konstantin Boudnik <[EMAIL PROTECTED]> wrote: > >> And once again - 0.22 seems to be forgotten for an unexplained reason. >> >> I urge to stick to original Arun's proposal and use 0.22 as 2.0 >> With the correction I like the following proposal. >> >> Cos >> >> On Tue, Nov 15, 2011 at 06:42PM, Matt Foley wrote: >>> I agree with some prior posters that renaming the 0.20-security sustaining >>> branch could be confusing. >>> How about the following (pseudo-code)? >>> >>> ## Just before we are ready to make rc0 for release 0.20.205.1, do: >>> svn copy branch-0.20-security-205 branch-1.0 >>> ## and actually release it from branch-1.0 as release 1.0.0 >>> >>> ## Then, after the 1.0.0 release vote ends successfully, do: >>> svn copy branch-0.20-security branch-1.1 >>> ## This will pick up the remaining changes done to date, which would >>> ## have gone into 0.20.206.0, and will instead go into release 1.1.0, >>> ## sometime in the future >>> >>> ## However, since branch-0.23 was just recently split from trunk, it should >>> be >>> ## upgraded to 2.0 in the usual way, with a rename: >>> svn mv branch-0.23 branch-2.0 >>> ## and also rename the actual release: >>> svn mv tags/release-0.23.0 tags/release-2.0.0 >>> ## The work currently going into the future 0.23.1 will become 2.0.1, not >>> 2.1.0. >>> ## Work going into trunk will become 2.1 or higher in the future. >>> >>> This is a concrete, actionable proposal. In an effort to establish >>> consensus, would it be appropriate to call a vote on it? >>> --Matt >>> >>> >>> On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >>> >>>> On 11/15/2011 05:49 PM, Eli Collins wrote: >>>>> Are you suggesting a two part version scheme? Ie >>>>> >>>>> 0.23.0 -> 2.0 >>>>> 0.23.1 -> 2.1 >>>> >>>> I didn't specify. We could either do that or: >>>> >>>> 0.23.0 -> 2.0.0 >>>> 0.23.1 -> 2.0.1 >>>> ... >>>> 0.24.0 -> 2.1.0 >>>> ... >>>> >>>> I don't care which much. Do you? >>>> >>>>> fwiw I'd map 0.20.200.0 to 1.0, 203.0 would be 1.3, 205.0, would be >>>>> 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename >>>>> 22 either since it both has features that are in 20x, and 20x has >>>>> features not in 22, and is not yet released or stable. Seems hard to >>>>> come up with a reasonable version number for it. >>>> >>>> This is about the fourth or fifth different proposal around these. I'm >>>> not sure things are congealing around a consensus. I don't want to >>>> stand in the way of that, but I think we might first settle the part +
Konstantin Shvachko 2011-11-16, 07:26
-
Re: [DISCUSS] Apache Hadoop 1.0?Konstantin Shvachko 2011-11-16, 15:46
A little wider perspective on where the renaming takes us and why it
is happening. My opinion. Last year around this same time the Hadoop project was on the verge of splitting. We had three "commercial" versions of Hadoop competing to be the "real" Hadoop, while the officially released Apache version was outdated. ASF did [amazingly] good job fencing off the claims for external ownership of the Hadoop name, which effectively stopped the split the way it was evolving. The danger of the External Project Split has passed: now the others can call their stuff XYZ-DH7 and be done with it. This fall a danger of Internal Project Split has emerged, because three versions were brewing independently. I call it a danger because more versions of Hadoop means splitting and spreading resources of the community including the (rapidly growing) software stack above. It also means stronger story for competing technologies. Which could be good, or bad, or both. The question is why does the project fall into Splitting danger every fall. My answer is it's the "Forever-20" syndrome. In the last several years there was always a "reason" to continue with 0.20. Mostly because businesses need to commit to a version for the next year in fall. This is irrelevant to an open source project development and contradicts its natural straight forward motion. As many of you, last week I have been at Hadoop World and ApacheCon and saw a lot (I mean thousands) of people, enthusiastic about the technology, but majorly confused about the versions. My concern is that the rename of 0.20.205 to 1.0 means the community will be stuck with it even longer, leading to the "Occupy Hadoop" movement camping in the Apache Extras park. I would have expected the RM of 0.23 advocating to call it 1.0, but it didn't happen. Renaming branches is not a big deal. The problem is that there is no a consolidating version on the horizon. I'll be glad to be wrong. --Konstantin +
Konstantin Shvachko 2011-11-16, 15:46
-
Re: [DISCUSS] Apache Hadoop 1.0?Scott Carey 2011-11-16, 07:06
On 11/15/11 6:47 PM, "Konstantin Boudnik" <[EMAIL PROTECTED]> wrote: >And once again - 0.22 seems to be forgotten for an unexplained reason. > >I urge to stick to original Arun's proposal and use 0.22 as 2.0 >With the correction I like the following proposal. If 0.20.20x ends up in the 1.0.x line, then 0.22.x should end up in the 1.1.x line, IMO. 0.22 is not a radical incompatible overhaul from 0.20.20x. So IMO it should not change the major version number, but only the minor one. However, 0.23 IS a major change, and could justify a 2.0.x. This all assumes the version numbers are going to start meaning something along the lines of major.minor.patch where - changes to major denote big, incompatible changes, - changes to minor denote large changes/additions/improvements, but backwards compatible - changes to patch denote bugfixes or minor additions/improvements with no compatibility impact. -Scott > >Cos > >On Tue, Nov 15, 2011 at 06:42PM, Matt Foley wrote: >> I agree with some prior posters that renaming the 0.20-security >>sustaining >> branch could be confusing. >> How about the following (pseudo-code)? >> >> ## Just before we are ready to make rc0 for release 0.20.205.1, do: >> svn copy branch-0.20-security-205 branch-1.0 >> ## and actually release it from branch-1.0 as release 1.0.0 >> >> ## Then, after the 1.0.0 release vote ends successfully, do: >> svn copy branch-0.20-security branch-1.1 >> ## This will pick up the remaining changes done to date, which would >> ## have gone into 0.20.206.0, and will instead go into release 1.1.0, >> ## sometime in the future >> >> ## However, since branch-0.23 was just recently split from trunk, it >>should >> be >> ## upgraded to 2.0 in the usual way, with a rename: >> svn mv branch-0.23 branch-2.0 >> ## and also rename the actual release: >> svn mv tags/release-0.23.0 tags/release-2.0.0 >> ## The work currently going into the future 0.23.1 will become 2.0.1, >>not >> 2.1.0. >> ## Work going into trunk will become 2.1 or higher in the future. >> >> This is a concrete, actionable proposal. In an effort to establish >> consensus, would it be appropriate to call a vote on it? >> --Matt >> >> >> On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting <[EMAIL PROTECTED]> >>wrote: >> >> > On 11/15/2011 05:49 PM, Eli Collins wrote: >> > > Are you suggesting a two part version scheme? Ie >> > > >> > > 0.23.0 -> 2.0 >> > > 0.23.1 -> 2.1 >> > >> > I didn't specify. We could either do that or: >> > >> > 0.23.0 -> 2.0.0 >> > 0.23.1 -> 2.0.1 >> > ... >> > 0.24.0 -> 2.1.0 >> > ... >> > >> > I don't care which much. Do you? >> > >> > > fwiw I'd map 0.20.200.0 to 1.0, 203.0 would be 1.3, 205.0, would be >> > > 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't >>rename >> > > 22 either since it both has features that are in 20x, and 20x has >> > > features not in 22, and is not yet released or stable. Seems hard to >> > > come up with a reasonable version number for it. >> > >> > This is about the fourth or fifth different proposal around these. >>I'm >> > not sure things are congealing around a consensus. I don't want to >> > stand in the way of that, but I think we might first settle the part >> > that we're nearer consensus on. >> > >> > Doug >> > +
Scott Carey 2011-11-16, 07:06
-
Re: [DISCUSS] Apache Hadoop 1.0?Arun Murthy 2011-11-16, 04:14
I think this discussion is getting too wide, can we tease them apart?
Do we agree we should call the forthcoming releases off branch-0.20-security as 1.x.x? Let me start a vote for just that. Arun Sent from my iPhone On Nov 15, 2011, at 6:43 PM, Matt Foley <[EMAIL PROTECTED]> wrote: > I agree with some prior posters that renaming the 0.20-security sustaining > branch could be confusing. > How about the following (pseudo-code)? > > ## Just before we are ready to make rc0 for release 0.20.205.1, do: > svn copy branch-0.20-security-205 branch-1.0 > ## and actually release it from branch-1.0 as release 1.0.0 > > ## Then, after the 1.0.0 release vote ends successfully, do: > svn copy branch-0.20-security branch-1.1 > ## This will pick up the remaining changes done to date, which would > ## have gone into 0.20.206.0, and will instead go into release 1.1.0, > ## sometime in the future > > ## However, since branch-0.23 was just recently split from trunk, it should > be > ## upgraded to 2.0 in the usual way, with a rename: > svn mv branch-0.23 branch-2.0 > ## and also rename the actual release: > svn mv tags/release-0.23.0 tags/release-2.0.0 > ## The work currently going into the future 0.23.1 will become 2.0.1, not > 2.1.0. > ## Work going into trunk will become 2.1 or higher in the future. > > This is a concrete, actionable proposal. In an effort to establish > consensus, would it be appropriate to call a vote on it? > --Matt > > > On Tue, Nov 15, 2011 at 5:56 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > >> On 11/15/2011 05:49 PM, Eli Collins wrote: >>> Are you suggesting a two part version scheme? Ie >>> >>> 0.23.0 -> 2.0 >>> 0.23.1 -> 2.1 >> >> I didn't specify. We could either do that or: >> >> 0.23.0 -> 2.0.0 >> 0.23.1 -> 2.0.1 >> ... >> 0.24.0 -> 2.1.0 >> ... >> >> I don't care which much. Do you? >> >>> fwiw I'd map 0.20.200.0 to 1.0, 203.0 would be 1.3, 205.0, would be >>> 1.5. I wouldn't rename 21 since we've abandoned it. I wouldn't rename >>> 22 either since it both has features that are in 20x, and 20x has >>> features not in 22, and is not yet released or stable. Seems hard to >>> come up with a reasonable version number for it. >> >> This is about the fourth or fifth different proposal around these. I'm >> not sure things are congealing around a consensus. I don't want to >> stand in the way of that, but I think we might first settle the part >> that we're nearer consensus on. >> >> Doug >> +
Arun Murthy 2011-11-16, 04:14
-
Re: [DISCUSS] Apache Hadoop 1.0?Eli Collins 2011-11-16, 04:51
On Tue, Nov 15, 2011 at 8:14 PM, Arun Murthy <[EMAIL PROTECTED]> wrote:
> I think this discussion is getting too wide, can we tease them apart? > > Do we agree we should call the forthcoming releases off > branch-0.20-security as 1.x.x? > > Let me start a vote for just that. +1 IMO the values of x.x should match the current dot versions eg 0.20.206.0 would be 1.6.0. Thanks, Eli +
Eli Collins 2011-11-16, 04:51
-
Re: [DISCUSS] Apache Hadoop 1.0?Joe Stein 2011-11-16, 05:37
If trunk releases would then mean 2.x.x then the branch 1x.x ( 0.20.06.0
being 1.6.0) makes total sense +1 (not binding) so the current trunk release = 2.0.0 and the branch release 0.20.206.0 1.6.0 speaking from those of us that have < 4,000 nodes in our cluster and want to proliferate the technology - one love, Hadoop not sure though about Todd's comment on HBase & Hive and how sister/brother projects have to deal with it. This should be important to not orphan them more than maybe already has been done (cough cough, append). On Tue, Nov 15, 2011 at 11:51 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > On Tue, Nov 15, 2011 at 8:14 PM, Arun Murthy <[EMAIL PROTECTED]> wrote: > > I think this discussion is getting too wide, can we tease them apart? > > > > Do we agree we should call the forthcoming releases off > > branch-0.20-security as 1.x.x? > > > > Let me start a vote for just that. > > +1 > > IMO the values of x.x should match the current dot versions eg > 0.20.206.0 would be 1.6.0. > > Thanks, > Eli > -- /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop <http://twitter.com/#!/allthingshadoop> */ +
Joe Stein 2011-11-16, 05:37
-
Re: [DISCUSS] Apache Hadoop 1.0?Arun Murthy 2011-11-16, 05:53
Eli,
Seems to me that trying to 'carry over' numbers from 0.20.2xx would, at best, lead to confusion... similar to folks asking for non-existent 0.20.201/202. I propose we look forward with hadoop-1.0.0 as the supported release with security+append to keep things simple. Thoughts? thanks, Arun Sent from my iPhone On Nov 15, 2011, at 8:52 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > On Tue, Nov 15, 2011 at 8:14 PM, Arun Murthy <[EMAIL PROTECTED]> wrote: >> I think this discussion is getting too wide, can we tease them apart? >> >> Do we agree we should call the forthcoming releases off >> branch-0.20-security as 1.x.x? >> >> Let me start a vote for just that. > > +1 > > IMO the values of x.x should match the current dot versions eg > 0.20.206.0 would be 1.6.0. > > Thanks, > Eli +
Arun Murthy 2011-11-16, 05:53
-
Re: [DISCUSS] Apache Hadoop 1.0?Eli Collins 2011-11-16, 06:13
On Tue, Nov 15, 2011 at 9:53 PM, Arun Murthy <[EMAIL PROTECTED]> wrote:
> Eli, > > Seems to me that trying to 'carry over' numbers from 0.20.2xx would, > at best, lead to confusion... similar to folks asking for non-existent > 0.20.201/202. > > I propose we look forward with hadoop-1.0.0 as the supported release > with security+append to keep things simple. > > Thoughts? > Are you proposing 205.1 be called 1.0.1 and 206 will be 1.1.0? Or that we rename 0.205.0 to be 1.0.0 and 205.1 will be 1.0.1 and 206.0 will be 1.1.0? That seems just as confusing IMO but I don't feel strongly either way. Thanks, Eli +
Eli Collins 2011-11-16, 06:13
-
Re: [DISCUSS] Apache Hadoop 1.0?Arun Murthy 2011-11-16, 07:05
Thanks Eli. In keeping with the theme of 'looking ahead' I was
thinking of upcoming 0.20.205.1 as 1.0.0. I'll clarify in the voting thread too. Sent from my iPhone On Nov 15, 2011, at 10:13 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > On Tue, Nov 15, 2011 at 9:53 PM, Arun Murthy <[EMAIL PROTECTED]> wrote: >> Eli, >> >> Seems to me that trying to 'carry over' numbers from 0.20.2xx would, >> at best, lead to confusion... similar to folks asking for non-existent >> 0.20.201/202. >> >> I propose we look forward with hadoop-1.0.0 as the supported release >> with security+append to keep things simple. >> >> Thoughts? >> > > Are you proposing 205.1 be called 1.0.1 and 206 will be 1.1.0? Or > that we rename 0.205.0 to be 1.0.0 and 205.1 will be 1.0.1 and 206.0 > will be 1.1.0? > > That seems just as confusing IMO but I don't feel strongly either way. > > Thanks, > Eli +
Arun Murthy 2011-11-16, 07:05
-
Re: [DISCUSS] Apache Hadoop 1.0?Konstantin Boudnik 2011-11-16, 02:06
I believe it has been advocated a number of times in that thread to release
0.22 as 2.0. Are you suggesting to drop 0.22 out of the picture all together? Any reason for that? Thanks, Cos On Tue, Nov 15, 2011 at 05:37PM, Doug Cutting wrote: > On 11/15/2011 01:43 PM, Todd Lipcon wrote: > > +1 to Steve's proposal. Renaming 0.20 is too big a pain at this point. > > Everyone seems to agree that we should rename 0.23 to either 2.0 or 3.0. > There are a number of different views about what to do with 0.20, 0.21 > and 0.22. So maybe we should proceed where there's consensus and not > argue extensively where there's disagreement? > > Since 0.23 has little install base yet it should be easy to rename. If > we're not going to rename 0.20, 0.21 or 0.22 releases then 3.0 seems > inappropriate. > > Can we agree to 0.23 -> 2.0? That's consistent with the MR2 nomenclature. > > Doug +
Konstantin Boudnik 2011-11-16, 02:06
-
Re: [DISCUSS] Apache Hadoop 1.0?Doug Cutting 2011-11-16, 17:15
On 11/15/2011 06:06 PM, Konstantin Boudnik wrote:
> Are you suggesting to drop 0.22 out of the picture all together? Any > reason for that? By no means. I thought that we might, as Scott Carey said, treat 0.22 as a minor release in the 1.x series. I'd prefer that we consistently rename branches (0.20.x becomes 1.0.x, 0.21.x becomes 1.1.x, etc.). We're rapidly falling into the trap of putting too much significance in a version number, seeking some sort of marketing boost by declaring 1.0. We can sidestep this by simply dropping the leading 0. and henceforth referring to things as 20, 21, 22, etc. This minimizes confusion, since there's no significant renaming, it gets us around the marketing issue of still being pre-1.0, and it keeps us from putting too much importance into version numbers. Doug +
Doug Cutting 2011-11-16, 17:15
-
Re: [DISCUSS] Apache Hadoop 1.0?Konstantin Boudnik 2011-11-16, 17:24
On Wed, Nov 16, 2011 at 09:15AM, Doug Cutting wrote:
> On 11/15/2011 06:06 PM, Konstantin Boudnik wrote: > > Are you suggesting to drop 0.22 out of the picture all together? Any > > reason for that? > > By no means. I thought that we might, as Scott Carey said, treat 0.22 > as a minor release in the 1.x series. I'd prefer that we consistently > rename branches (0.20.x becomes 1.0.x, 0.21.x becomes 1.1.x, etc.). Thanks for the explanation. I see your point in 1.?.x renames. My only concern is that it might suggest that to the users that 1.2.0 (e.g. current 0.22) is a sort of natural continuation from 1.0.0 (current 0.20.x) and the upgrade would be easy and automatic. Which isn't necessary the case, IMO. Separating them in two major versions won't be sending such a message. > We're rapidly falling into the trap of putting too much significance in > a version number, seeking some sort of marketing boost by declaring 1.0. > We can sidestep this by simply dropping the leading 0. and henceforth > referring to things as 20, 21, 22, etc. This minimizes confusion, since > there's no significant renaming, it gets us around the marketing issue > of still being pre-1.0, and it keeps us from putting too much importance > into version numbers. I guess this might work too. Cos +
Konstantin Boudnik 2011-11-16, 17:24
-
Re: [DISCUSS] Apache Hadoop 1.0?Scott Carey 2011-11-16, 18:15
On 11/16/11 9:24 AM, "Konstantin Boudnik" <[EMAIL PROTECTED]> wrote: >On Wed, Nov 16, 2011 at 09:15AM, Doug Cutting wrote: >> On 11/15/2011 06:06 PM, Konstantin Boudnik wrote: >> > Are you suggesting to drop 0.22 out of the picture all together? Any >> > reason for that? >> >> By no means. I thought that we might, as Scott Carey said, treat 0.22 >> as a minor release in the 1.x series. I'd prefer that we consistently >> rename branches (0.20.x becomes 1.0.x, 0.21.x becomes 1.1.x, etc.). > >Thanks for the explanation. I see your point in 1.?.x renames. My only >concern >is that it might suggest that to the users that 1.2.0 (e.g. current 0.22) >is a >sort of natural continuation from 1.0.0 (current 0.20.x) and the upgrade >would >be easy and automatic. Which isn't necessary the case, IMO. IMO what is important from the development and maintenance perspective is the _meaning_ of the major.minor.patch numbers as described in my previous message. If a minor version number bump means that it is a superset of the previous release and is backwards compatible, then that requirement on its own answers whether 0.22 can become 1.1, or if it must be a 2.0 release. Whether hadoop starts using a new meaning for major.minor.patch is what is of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing. The version number is completely meaningless on its own, pure marketing. However, if the numbers gain meaning through a clear definition of what the major.minor.patch numbers signify, then there is meaning and structure going forward. The current state of affairs seems to be: major: always 0 minor: potentially big changes; almost always breaks wire compatibility; occasionally breaks API backwards compatibility minor: typically bug fixes only; 'bug fix' not well defined; almost never breaks API or wire compatibility I think the community can decide two things independently: - Should 0.20.20x be renamed 1.0.y ? (perhaps not, perhaps 0.23 should be 1.0 and the others left alone). - Should hadoop adopt a new clear definition of major.minor.patch number significance? example proposal: * major version number increment: signifies breaks in API backwards compatibility and/or major architecture overhauls. * minor version number increment: signifies possible API changes, but maintains API backwards compatibility. Wire compatibility may break (see release notes). Included functionality is a superset of previous minor release. * patch version number increment: signifies a release where all improvements are fully backwards compatible with the previous patch version, including wire format. Any release may contain new features or improvements, provided they don't break the compatibility rules and the release manager approves of the inclusion. It is not worth defining whether a change is a 'bug fix' 'new feature' or 'improvement' and dictating any rules based on that -- these can often blur together and can be dealt with on a case by case basis instead of through version rules. IMO guiding the meaning of version numbers by compatibility class makes the most sense. Whatever the meaning of the numbers turns out to be will dictate whether releases after a 1.0.x need to be 2.0.x or can be 1.1.x >Separating them in two major versions won't be sending such a message. > >> We're rapidly falling into the trap of putting too much significance in >> a version number, seeking some sort of marketing boost by declaring 1.0. >> We can sidestep this by simply dropping the leading 0. and henceforth >> referring to things as 20, 21, 22, etc. This minimizes confusion, since >> there's no significant renaming, it gets us around the marketing issue >> of still being pre-1.0, and it keeps us from putting too much importance >> into version numbers. > >I guess this might work too. > >Cos +
Scott Carey 2011-11-16, 18:15
-
Re: [DISCUSS] Apache Hadoop 1.0?Doug Cutting 2011-11-16, 19:57
On 11/16/2011 10:15 AM, Scott Carey wrote:
> IMO what is important from the development and maintenance perspective is > the _meaning_ of the > major.minor.patch numbers as described in my previous message. > > If a minor version number bump means that it is a superset of the previous > release and is backwards compatible, then that requirement on its own > answers whether 0.22 can become 1.1, or if it must be a 2.0 release. > > Whether hadoop starts using a new meaning for major.minor.patch is what is > of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing. Scott, this is a great point. Thanks for making it. > The version number is completely meaningless on its own, pure marketing. > However, if the numbers gain meaning through a clear definition of what > the major.minor.patch numbers signify, then there is meaning and structure > going forward. > The current state of affairs seems to be: > major: always 0 > minor: potentially big changes; almost always breaks wire compatibility; > occasionally breaks API backwards compatibility > minor: typically bug fixes only; 'bug fix' not well defined; almost never > breaks API or wire compatibility Long ago I proposed such rules for Hadoop releases at: http://wiki.apache.org/hadoop/Roadmap These state that pre-1.0 releases behave roughly as above. > I think the community can decide two things independently: > > - Should 0.20.20x be renamed 1.0.y ? (perhaps not, perhaps 0.23 should be > 1.0 and the others left alone). > - Should hadoop adopt a new clear definition of major.minor.patch number > significance? Would you care to call a vote on one or both of these? > example proposal: > * major version number increment: signifies breaks in API backwards > compatibility and/or major architecture overhauls. > * minor version number increment: signifies possible API changes, but > maintains API backwards compatibility. Wire compatibility may break (see > release notes). Included functionality is a superset of previous minor > release. > * patch version number increment: signifies a release where all > improvements are fully backwards compatible with the previous patch > version, including wire format. This is also similar to what the Roadmap wiki page indicates for post-1.0 releases. Renaming things after the fact to try to make them consistent when the prior rules weren't consistently followed is not easy. Instead we might better focus on rules that we intend to obey for releases going forward and then obey them. > Whatever the meaning of the numbers turns out to be will dictate whether > releases after a 1.0.x need to be 2.0.x or can be 1.1.x Good point. The most accurate approach would probably be to call each existing branch a distinct major release. Dropping the leading zero would reduce confusion and avoid marketing but would still combine 0.20.x and 0.20.20x which perhaps ought to be considered separate major releases. For me this is however a reasonable tradeoff since we're better off focusing on improving things in the future than arguing about marketing and how to hide our past versioning mistakes. Doug +
Doug Cutting 2011-11-16, 19:57
-
Re: [DISCUSS] Apache Hadoop 1.0?Matt Foley 2011-11-16, 21:11
I support giving all three active code branches a clean start, on an equal
footing: - The next release of 0.20-security (formerly expected as "0.20.205.1") to be 1.0.0, establishing branch-1.0 - The next release of 0.22 to be 2.0.0, establishing branch-2.0 - The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0, from which the formerly expected "0.23.1" may be released as 3.0.1 - All three code branches to obey the established major.minor.patch versioning rules going forward. - So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the then release manager, and the pleasure of the community. Regards, --Matt On Wed, Nov 16, 2011 at 11:57 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > On 11/16/2011 10:15 AM, Scott Carey wrote: > > IMO what is important from the development and maintenance perspective is > > the _meaning_ of the > > major.minor.patch numbers as described in my previous message. > > > > If a minor version number bump means that it is a superset of the > previous > > release and is backwards compatible, then that requirement on its own > > answers whether 0.22 can become 1.1, or if it must be a 2.0 release. > > > > Whether hadoop starts using a new meaning for major.minor.patch is what > is > > of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing. > > Scott, this is a great point. Thanks for making it. > > > The version number is completely meaningless on its own, pure marketing. > > However, if the numbers gain meaning through a clear definition of what > > the major.minor.patch numbers signify, then there is meaning and > structure > > going forward. > > The current state of affairs seems to be: > > major: always 0 > > minor: potentially big changes; almost always breaks wire compatibility; > > occasionally breaks API backwards compatibility > > minor: typically bug fixes only; 'bug fix' not well defined; almost > never > > breaks API or wire compatibility > > Long ago I proposed such rules for Hadoop releases at: > > http://wiki.apache.org/hadoop/Roadmap > > These state that pre-1.0 releases behave roughly as above. > > > I think the community can decide two things independently: > > > > - Should 0.20.20x be renamed 1.0.y ? (perhaps not, perhaps 0.23 should > be > > 1.0 and the others left alone). > > - Should hadoop adopt a new clear definition of major.minor.patch number > > significance? > > Would you care to call a vote on one or both of these? > > > example proposal: > > * major version number increment: signifies breaks in API backwards > > compatibility and/or major architecture overhauls. > > * minor version number increment: signifies possible API changes, but > > maintains API backwards compatibility. Wire compatibility may break (see > > release notes). Included functionality is a superset of previous minor > > release. > > * patch version number increment: signifies a release where all > > improvements are fully backwards compatible with the previous patch > > version, including wire format. > > This is also similar to what the Roadmap wiki page indicates for > post-1.0 releases. > > Renaming things after the fact to try to make them consistent when the > prior rules weren't consistently followed is not easy. Instead we might > better focus on rules that we intend to obey for releases going forward > and then obey them. > > > Whatever the meaning of the numbers turns out to be will dictate whether > > releases after a 1.0.x need to be 2.0.x or can be 1.1.x > > Good point. The most accurate approach would probably be to call each > existing branch a distinct major release. Dropping the leading zero > would reduce confusion and avoid marketing but would still combine > 0.20.x and 0.20.20x which perhaps ought to be considered separate major > releases. For me this is however a reasonable tradeoff since we're > better off focusing on improving things in the future than arguing about > marketing and how to hide our past versioning mistakes. > +
Matt Foley 2011-11-16, 21:11
-
Re: [DISCUSS] Apache Hadoop 1.0?Owen O'Malley 2011-11-16, 21:37
+1 to Matt's proposal, although I'd modify it slightly to say that:
branch-0.20-security -> branch-1 branch-0.20-security-205 -> branch-1.0 -- Owen +
Owen O'Malley 2011-11-16, 21:37
-
Re: [DISCUSS] Apache Hadoop 1.0?Joe Stein 2011-11-16, 21:53
+1 to Owen's slight modification and to Matt's proposal with a minor (no
pun intended) suggestion branch-0.20-security -> branch-1.0 branch-0.20-security-205 -> branch-1.1.0 On Wed, Nov 16, 2011 at 4:37 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > +1 to Matt's proposal, although I'd modify it slightly to say that: > > branch-0.20-security -> branch-1 > branch-0.20-security-205 -> branch-1.0 > > -- Owen > -- /* Joe Stein http://www.linkedin.com/in/charmalloc Twitter: @allthingshadoop <http://twitter.com/#!/allthingshadoop> */ +
Joe Stein 2011-11-16, 21:53
-
Re: [DISCUSS] Apache Hadoop 1.0?Roman Shaposhnik 2011-11-16, 23:05
On Wed, Nov 16, 2011 at 1:11 PM, Matt Foley <[EMAIL PROTECTED]> wrote:
> I support giving all three active code branches a clean start, on an equal > footing: > > - The next release of 0.20-security (formerly expected as "0.20.205.1") to > be 1.0.0, establishing branch-1.0 > - The next release of 0.22 to be 2.0.0, establishing branch-2.0 > - The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0, > from which the formerly expected "0.23.1" may be released as 3.0.1 > - All three code branches to obey the established major.minor.patch > versioning rules going forward. > - So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the > then release manager, and the pleasure of the community. +1 on all the points above. This is by far the most reasonable proposal I've seen on this thread. Thanks, Roman. +
Roman Shaposhnik 2011-11-16, 23:05
-
Re: [DISCUSS] Apache Hadoop 1.0?Andrew Purtell 2011-11-16, 23:40
> On Wed, Nov 16, 2011 at 1:11 PM, Matt Foley wrote:
> I support giving all three active code branches a clean start, on an equal > footing: > > - The next release of 0.20-security (formerly expected as > "0.20.205.1") to be 1.0.0, establishing branch-1.0 > - The next release of 0.22 to be 2.0.0, establishing branch-2.0 > - The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0, > from which the formerly expected "0.23.1" may be released as 3.0.1 > - All three code branches to obey the established major.minor.patch > versioning rules going forward. > - So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the > then release manager, and the pleasure of the community. +1 non binding Clean and easily understood. After an initial round of "what just happened?", it will be much easier explaining Hadoop evolution to users. - Andy +
Andrew Purtell 2011-11-16, 23:40
-
Re: [DISCUSS] Apache Hadoop 1.0?Arun C Murthy 2011-11-17, 00:03
How about this?
I have a vote running for 1.x at this point. We seem to agree about major/minor/patch version and need for compatibility. Beyond that, all other releases (at this point), whether it's 0.22 (unreleased) or 0.23 (very alpha) are not worth debating endlessly. Should we just revisit the versioning discussion when we are ready to release them and/or support them? I'm happy to continue using 0.23.x for now - I'd rather spend time on fixing 0.23.x than debating now. To me this seems like a very Apache thing to do, what matters is the code and the community - debates on versioning can come later when the bits are ready. No amount of labelling will either produce or stabilize the software. Thoughts? Arun On Nov 16, 2011, at 1:11 PM, Matt Foley wrote: > I support giving all three active code branches a clean start, on an equal > footing: > > - The next release of 0.20-security (formerly expected as "0.20.205.1") to > be 1.0.0, establishing branch-1.0 > - The next release of 0.22 to be 2.0.0, establishing branch-2.0 > - The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0, > from which the formerly expected "0.23.1" may be released as 3.0.1 > - All three code branches to obey the established major.minor.patch > versioning rules going forward. > - So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the > then release manager, and the pleasure of the community. > > Regards, > --Matt > > On Wed, Nov 16, 2011 at 11:57 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > >> On 11/16/2011 10:15 AM, Scott Carey wrote: >>> IMO what is important from the development and maintenance perspective is >>> the _meaning_ of the >>> major.minor.patch numbers as described in my previous message. >>> >>> If a minor version number bump means that it is a superset of the >> previous >>> release and is backwards compatible, then that requirement on its own >>> answers whether 0.22 can become 1.1, or if it must be a 2.0 release. >>> >>> Whether hadoop starts using a new meaning for major.minor.patch is what >> is >>> of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing. >> >> Scott, this is a great point. Thanks for making it. >> >>> The version number is completely meaningless on its own, pure marketing. >>> However, if the numbers gain meaning through a clear definition of what >>> the major.minor.patch numbers signify, then there is meaning and >> structure >>> going forward. >>> The current state of affairs seems to be: >>> major: always 0 >>> minor: potentially big changes; almost always breaks wire compatibility; >>> occasionally breaks API backwards compatibility >>> minor: typically bug fixes only; 'bug fix' not well defined; almost >> never >>> breaks API or wire compatibility >> >> Long ago I proposed such rules for Hadoop releases at: >> >> http://wiki.apache.org/hadoop/Roadmap >> >> These state that pre-1.0 releases behave roughly as above. >> >>> I think the community can decide two things independently: >>> >>> - Should 0.20.20x be renamed 1.0.y ? (perhaps not, perhaps 0.23 should >> be >>> 1.0 and the others left alone). >>> - Should hadoop adopt a new clear definition of major.minor.patch number >>> significance? >> >> Would you care to call a vote on one or both of these? >> >>> example proposal: >>> * major version number increment: signifies breaks in API backwards >>> compatibility and/or major architecture overhauls. >>> * minor version number increment: signifies possible API changes, but >>> maintains API backwards compatibility. Wire compatibility may break (see >>> release notes). Included functionality is a superset of previous minor >>> release. >>> * patch version number increment: signifies a release where all >>> improvements are fully backwards compatible with the previous patch >>> version, including wire format. >> >> This is also similar to what the Roadmap wiki page indicates for >> post-1.0 releases. >> >> Renaming things after the fact to try to make them consistent when the +
Arun C Murthy 2011-11-17, 00:03
-
Re: [DISCUSS] Apache Hadoop 1.0?Eric Yang 2011-11-17, 00:05
+1 on Matt's proposal.
On Wed, Nov 16, 2011 at 1:11 PM, Matt Foley <[EMAIL PROTECTED]> wrote: > I support giving all three active code branches a clean start, on an equal > footing: > > - The next release of 0.20-security (formerly expected as "0.20.205.1") to > be 1.0.0, establishing branch-1.0 > - The next release of 0.22 to be 2.0.0, establishing branch-2.0 > - The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0, > from which the formerly expected "0.23.1" may be released as 3.0.1 > - All three code branches to obey the established major.minor.patch > versioning rules going forward. > - So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the > then release manager, and the pleasure of the community. > > Regards, > --Matt > > On Wed, Nov 16, 2011 at 11:57 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > >> On 11/16/2011 10:15 AM, Scott Carey wrote: >> > IMO what is important from the development and maintenance perspective is >> > the _meaning_ of the >> > major.minor.patch numbers as described in my previous message. >> > >> > If a minor version number bump means that it is a superset of the >> previous >> > release and is backwards compatible, then that requirement on its own >> > answers whether 0.22 can become 1.1, or if it must be a 2.0 release. >> > >> > Whether hadoop starts using a new meaning for major.minor.patch is what >> is >> > of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing. >> >> Scott, this is a great point. Thanks for making it. >> >> > The version number is completely meaningless on its own, pure marketing. >> > However, if the numbers gain meaning through a clear definition of what >> > the major.minor.patch numbers signify, then there is meaning and >> structure >> > going forward. >> > The current state of affairs seems to be: >> > major: always 0 >> > minor: potentially big changes; almost always breaks wire compatibility; >> > occasionally breaks API backwards compatibility >> > minor: typically bug fixes only; 'bug fix' not well defined; almost >> never >> > breaks API or wire compatibility >> >> Long ago I proposed such rules for Hadoop releases at: >> >> http://wiki.apache.org/hadoop/Roadmap >> >> These state that pre-1.0 releases behave roughly as above. >> >> > I think the community can decide two things independently: >> > >> > - Should 0.20.20x be renamed 1.0.y ? (perhaps not, perhaps 0.23 should >> be >> > 1.0 and the others left alone). >> > - Should hadoop adopt a new clear definition of major.minor.patch number >> > significance? >> >> Would you care to call a vote on one or both of these? >> >> > example proposal: >> > * major version number increment: signifies breaks in API backwards >> > compatibility and/or major architecture overhauls. >> > * minor version number increment: signifies possible API changes, but >> > maintains API backwards compatibility. Wire compatibility may break (see >> > release notes). Included functionality is a superset of previous minor >> > release. >> > * patch version number increment: signifies a release where all >> > improvements are fully backwards compatible with the previous patch >> > version, including wire format. >> >> This is also similar to what the Roadmap wiki page indicates for >> post-1.0 releases. >> >> Renaming things after the fact to try to make them consistent when the >> prior rules weren't consistently followed is not easy. Instead we might >> better focus on rules that we intend to obey for releases going forward >> and then obey them. >> >> > Whatever the meaning of the numbers turns out to be will dictate whether >> > releases after a 1.0.x need to be 2.0.x or can be 1.1.x >> >> Good point. The most accurate approach would probably be to call each >> existing branch a distinct major release. Dropping the leading zero >> would reduce confusion and avoid marketing but would still combine >> 0.20.x and 0.20.20x which perhaps ought to be considered separate major >> releases. For me this is however a reasonable tradeoff since we're +
Eric Yang 2011-11-17, 00:05
-
Re: [DISCUSS] Apache Hadoop 1.0?Konstantin Boudnik 2011-11-17, 05:54
On Wed, Nov 16, 2011 at 01:11PM, Matt Foley wrote:
> I support giving all three active code branches a clean start, on an equal > footing: > > - The next release of 0.20-security (formerly expected as "0.20.205.1") to > be 1.0.0, establishing branch-1.0 > - The next release of 0.22 to be 2.0.0, establishing branch-2.0 > - The recent release of 0.23.0 to be 3.0.0, establishing branch-3.0, > from which the formerly expected "0.23.1" may be released as 3.0.1 > - All three code branches to obey the established major.minor.patch > versioning rules going forward. +1 on all three Cos > - So the next release from trunk to be 3.1.0 or 4.0.0, at the choice of the > then release manager, and the pleasure of the community. > > Regards, > --Matt > > On Wed, Nov 16, 2011 at 11:57 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > > > On 11/16/2011 10:15 AM, Scott Carey wrote: > > > IMO what is important from the development and maintenance perspective is > > > the _meaning_ of the > > > major.minor.patch numbers as described in my previous message. > > > > > > If a minor version number bump means that it is a superset of the > > previous > > > release and is backwards compatible, then that requirement on its own > > > answers whether 0.22 can become 1.1, or if it must be a 2.0 release. > > > > > > Whether hadoop starts using a new meaning for major.minor.patch is what > > is > > > of interest to me; starting at 1.x.y or 20.x.y or 999.x.y is marketing. > > > > Scott, this is a great point. Thanks for making it. > > > > > The version number is completely meaningless on its own, pure marketing. > > > However, if the numbers gain meaning through a clear definition of what > > > the major.minor.patch numbers signify, then there is meaning and > > structure > > > going forward. > > > The current state of affairs seems to be: > > > major: always 0 > > > minor: potentially big changes; almost always breaks wire compatibility; > > > occasionally breaks API backwards compatibility > > > minor: typically bug fixes only; 'bug fix' not well defined; almost > > never > > > breaks API or wire compatibility > > > > Long ago I proposed such rules for Hadoop releases at: > > > > http://wiki.apache.org/hadoop/Roadmap > > > > These state that pre-1.0 releases behave roughly as above. > > > > > I think the community can decide two things independently: > > > > > > - Should 0.20.20x be renamed 1.0.y ? (perhaps not, perhaps 0.23 should > > be > > > 1.0 and the others left alone). > > > - Should hadoop adopt a new clear definition of major.minor.patch number > > > significance? > > > > Would you care to call a vote on one or both of these? > > > > > example proposal: > > > * major version number increment: signifies breaks in API backwards > > > compatibility and/or major architecture overhauls. > > > * minor version number increment: signifies possible API changes, but > > > maintains API backwards compatibility. Wire compatibility may break (see > > > release notes). Included functionality is a superset of previous minor > > > release. > > > * patch version number increment: signifies a release where all > > > improvements are fully backwards compatible with the previous patch > > > version, including wire format. > > > > This is also similar to what the Roadmap wiki page indicates for > > post-1.0 releases. > > > > Renaming things after the fact to try to make them consistent when the > > prior rules weren't consistently followed is not easy. Instead we might > > better focus on rules that we intend to obey for releases going forward > > and then obey them. > > > > > Whatever the meaning of the numbers turns out to be will dictate whether > > > releases after a 1.0.x need to be 2.0.x or can be 1.1.x > > > > Good point. The most accurate approach would probably be to call each > > existing branch a distinct major release. Dropping the leading zero > > would reduce confusion and avoid marketing but would still combine > > 0.20.x and 0.20.20x which perhaps ought to be considered separate major +
Konstantin Boudnik 2011-11-17, 05:54
-
Re: [DISCUSS] Apache Hadoop 1.0?Arun C Murthy 2011-11-16, 22:43
On Nov 16, 2011, at 11:57 AM, Doug Cutting wrote: > On 11/16/2011 10:15 AM, Scott Carey wrote: >> - Should hadoop adopt a new clear definition of major.minor.patch number >> significance? > > Would you care to call a vote on one or both of these? Great points Scott and Doug. I agree about the need for clarity on major/minor/patch significance. I'll start a vote and update the Roadmap with the results. Also, along with, we need a clear idea about what it means for a major version bump. I propose we adopt the convention that a new major version should be a superset of the previous major version, features-wise. Does that sound reasonable? thanks, Arun +
Arun C Murthy 2011-11-16, 22:43
-
Re: [DISCUSS] Apache Hadoop 1.0?Doug Cutting 2011-11-16, 23:02
On 11/16/2011 02:43 PM, Arun C Murthy wrote:
> I propose we adopt the convention that a new major version should be a superset of the previous major version, features-wise. That means that we could never discard a feature, no? One definition is that a major release includes some fundamental changes, e.g., new primary APIs or a re-implementation of primary components. MR2 probably qualifies as both. With a large system with many APIs and components this becomes a rather subjective measure, but I don't see an easy way around that. Another definition is that a major release permits incompatible changes, either in APIs, wire-formats, on-disk formats, etc. This is more objective measure. For example, one might in release X+1 deprecate features of release X but still remain compatible with them, while in X+2 we'd remove them. So every major release would make incompatible changes, but only of things that had been deprecated two releases ago. Often the reason for the incompatible changes is new primary APIs or re-implementation of primary components, but those more subjective measures would not be the justification for the major version, rather any incompatible changes would. Of course, we should work hard to never make incompatible changes... Doug +
Doug Cutting 2011-11-16, 23:02
-
Re: [DISCUSS] Apache Hadoop 1.0?Arun C Murthy 2011-11-16, 23:05
Agreed.
We will discard features as we go along, but we need to have consensus to discard major features. Is that fair? And we discard them for reasons you outlined... Arun On Nov 16, 2011, at 3:02 PM, Doug Cutting wrote: > On 11/16/2011 02:43 PM, Arun C Murthy wrote: >> I propose we adopt the convention that a new major version should be a superset of the previous major version, features-wise. > > That means that we could never discard a feature, no? > > One definition is that a major release includes some fundamental > changes, e.g., new primary APIs or a re-implementation of primary > components. MR2 probably qualifies as both. With a large system with > many APIs and components this becomes a rather subjective measure, but I > don't see an easy way around that. > > Another definition is that a major release permits incompatible changes, > either in APIs, wire-formats, on-disk formats, etc. This is more > objective measure. For example, one might in release X+1 deprecate > features of release X but still remain compatible with them, while in > X+2 we'd remove them. So every major release would make incompatible > changes, but only of things that had been deprecated two releases ago. > Often the reason for the incompatible changes is new primary APIs or > re-implementation of primary components, but those more subjective > measures would not be the justification for the major version, rather > any incompatible changes would. > > Of course, we should work hard to never make incompatible changes... > > Doug +
Arun C Murthy 2011-11-16, 23:05
-
Re: [DISCUSS] Apache Hadoop 1.0?Arun C Murthy 2011-11-16, 23:13
On Nov 16, 2011, at 3:02 PM, Doug Cutting wrote: > > Another definition is that a major release permits incompatible changes, > either in APIs, wire-formats, on-disk formats, etc. This is more > objective measure. For example, one might in release X+1 deprecate > features of release X but still remain compatible with them, while in > X+2 we'd remove them. So every major release would make incompatible > changes, but only of things that had been deprecated two releases ago. > Often the reason for the incompatible changes is new primary APIs or > re-implementation of primary components, but those more subjective > measures would not be the justification for the major version, rather > any incompatible changes would. If I wasn't clear, I'd much rather prefer this objective measure. +1 Arun +
Arun C Murthy 2011-11-16, 23:13
-
Re: [DISCUSS] Apache Hadoop 1.0?sanjay Radia 2011-11-17, 01:11
On Nov 16, 2011, at 3:02 PM, Doug Cutting wrote: > > > Another definition is that a major release permits incompatible changes, > either in APIs, wire-formats, on-disk formats, etc. This is more > objective measure. For example, one might in release X+1 deprecate > features of release X but still remain compatible with them, while in > X+2 we'd remove them. So every major release would make incompatible > changes, but only of things that had been deprecated two releases ago. > Often the reason for the incompatible changes is new primary APIs or > re-implementation of primary components, but those more subjective > measures would not be the justification for the major version, rather > any incompatible changes would. This is mostly consistent with what is stated in wrt to API changes in HADOOP-5071 on "Hadoop Compatibility requirements" : https://issues.apache.org/jira/browse/HADOOP-5071. HADOOP-5071 was derived from a long series of email discussions and describes some of the subtle nuances for compatibility for API, on-disk format, wire etc. Some notes (see details there) 1) break in compatibility => major number change, but major number change does NOT => break in compatibility. 2) we routinely change on disk format on hdfs but do an automatic upgrade. That is okay and allowed without a major number change. > > Of course, we should work hard to never make incompatible changes... Agreed. Once things are in customer hands it is very hard to remove even deprecated methods (But once in a while we have to have to do it after sufficient time to upgrade to new APIs). Java for example does not remove deprecated methods. sanjay +
sanjay Radia 2011-11-17, 01:11
-
Re: [DISCUSS] Apache Hadoop 1.0?Nathan Roberts 2011-11-16, 23:51
On 11/16/11 4:43 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote:
> I propose we adopt the convention that a new major version should be a superset of the previous major version, features-wise. Just so I'm clear. This is only guaranteed at the time the new major version is started. A day later a previous major line may merge a feature from trunk and then it's no longer the case that 2.x.y is a superset. If that's the case I'm not sure of the value of the convention. We could say that new major versions always start from trunk, but that doesn't have meaning outside of the developer community. On Nov 16, 2011, at 3:02 PM, Doug Cutting wrote: > > Another definition is that a major release permits incompatible changes, > either in APIs, wire-formats, on-disk formats, etc. Are our wire formats stable enough in all release lines that we're ready to live by this? It would mean the 1.x.y line could not change a wire format without a bump to the major number, which would obviously cause issues. Even in the 23.x line I thought there were still some wire compatibility changes pending which would mean we'd quickly be moving to a 4.x line. Nathan +
Nathan Roberts 2011-11-16, 23:51
-
Re: [DISCUSS] Apache Hadoop 1.0?Doug Cutting 2011-11-17, 00:13
On 11/16/2011 03:51 PM, Nathan Roberts wrote:
>> > Another definition is that a major release permits incompatible changes, >> > either in APIs, wire-formats, on-disk formats, etc. > Are our wire formats stable enough in all release lines that we're ready to live by this? No. Long-term we'd like to only break wire-compatibility in major releases, and ideally not even then. So the set of things that have to be compatible within minor releases does not currently include wire formats, but we hope eventually will. Doug +
Doug Cutting 2011-11-17, 00:13
-
Re: [DISCUSS] Apache Hadoop 1.0?Scott Carey 2011-11-17, 01:37
On 11/16/11 4:13 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: >On 11/16/2011 03:51 PM, Nathan Roberts wrote: >>> > Another definition is that a major release permits incompatible >>>changes, >>> > either in APIs, wire-formats, on-disk formats, etc. >> Are our wire formats stable enough in all release lines that we're >>ready to live by this? > >No. Long-term we'd like to only break wire-compatibility in major >releases, and ideally not even then. So the set of things that have to >be compatible within minor releases does not currently include wire >formats, but we hope eventually will. Currently, it can probably be enforced that patch versions don't change wire format, and that minor versions don't break API. Major versions may break both. At a later time, it may be possible to enforce that minor version bumps be wire compatible. At that time, the version rules going forward can change. > >Doug > +
Scott Carey 2011-11-17, 01:37
-
Re: [DISCUSS] Apache Hadoop 1.0?Scott Carey 2011-11-17, 02:06
On 11/16/11 3:51 PM, "Nathan Roberts" <[EMAIL PROTECTED]> wrote: >On 11/16/11 4:43 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: >> I propose we adopt the convention that a new major version should be a >>superset of the previous major version, features-wise. >Just so I'm clear. This is only guaranteed at the time the new major >version is started. A day later a previous major line may merge a feature >from trunk and then it's no longer the case that 2.x.y is a superset. If >that's the case I'm not sure of the value of the convention. We could say >that new major versions always start from trunk, but that doesn't have >meaning outside of the developer community. I don't think in general one can say that major versions are a superset of previous major versions. Then you would need to have a SuperMajor version number for the (rare) times that this was broken. In other words, the major version number really can't have any restrictions. Perhaps however, one can say that minor versions are supersets of prior minor version if one were to define 'superset'. Its going to be hard to claim that the 0.23 branch is a superset of 0.22 -- After all, there is no JobTracker and all sorts of stuff has been removed or replaced with something else. Whether that defines a superset or not gets into a lot of semantics of what we mean by 'superset'. Perhaps like 'feature' or 'bug fix', it is best not to get into the semantics of defining what we mean by 'superset' and rather define version number meaning only in terms of compatibility classifications. Especially since the compatibility classification has implications for all of these other things -- and IMO more clearly useful ones. For example, consider that a "bug fix" may break wire compatibility, that a tiny harmless change can be considered a "new feature", or that replacing a single link in a UI could be considered breaking a "superset" rule. > >On Nov 16, 2011, at 3:02 PM, Doug Cutting wrote: >> >> Another definition is that a major release permits incompatible changes, >> either in APIs, wire-formats, on-disk formats, etc. >Are our wire formats stable enough in all release lines that we're ready >to live by this? It would mean the 1.x.y line could not change a wire >format without a bump to the major number, which would obviously cause >issues. Even in the 23.x line I thought there were still some wire >compatibility changes pending which would mean we'd quickly be moving to >a 4.x line. > >Nathan > +
Scott Carey 2011-11-17, 02:06
-
Re: [DISCUSS] Apache Hadoop 1.0?Steve Loughran 2011-11-17, 10:45
On 17/11/11 02:06, Scott Carey wrote:
> > > On 11/16/11 3:51 PM, "Nathan Roberts"<[EMAIL PROTECTED]> wrote: > >> On 11/16/11 4:43 PM, "Arun C Murthy"<[EMAIL PROTECTED]> wrote: >>> I propose we adopt the convention that a new major version should be a >>> superset of the previous major version, features-wise. >> Just so I'm clear. This is only guaranteed at the time the new major >> version is started. A day later a previous major line may merge a feature >>from trunk and then it's no longer the case that 2.x.y is a superset. If >> that's the case I'm not sure of the value of the convention. We could say >> that new major versions always start from trunk, but that doesn't have >> meaning outside of the developer community. > > I don't think in general one can say that major versions are a superset of > previous major versions. Then you would need to have a SuperMajor version > number for the (rare) times that this was broken. > In other words, the major version number really can't have any > restrictions. > Perhaps however, one can say that minor versions are supersets of prior > minor version if one were to define 'superset'. > > Its going to be hard to claim that the 0.23 branch is a superset of 0.22 > -- After all, there is no JobTracker and all sorts of stuff has been > removed or replaced with something else. Whether that defines a superset > or not gets into a lot of semantics of what we mean by 'superset'. > Perhaps like 'feature' or 'bug fix', it is best not to get into the > semantics of defining what we mean by 'superset' and rather define version > number meaning only in terms of compatibility classifications. Especially > since the compatibility classification has implications for all of these > other things -- and IMO more clearly useful ones. For example, consider > that a "bug fix" may break wire compatibility, that a tiny harmless change > can be considered a "new feature", or that replacing a single link in a UI > could be considered breaking a "superset" rule. > I think it would be good to distinguish user-API supersets/subsets with internal superset/subsets -0.23 is a superset of the MR and HDFS APIs compatible with previous versions (I don't know or care whether or not it is a proper superset or not). The goal here is that end user apps and higher levels in the stack (in-ASF and out-ASF) should work, though testing is required to verify this. A failure of the layers above to work with 0.23+ is something that should be considered a regression, looked at and then either dismissed as "you weren't meant to do that" or triggers a fix. -0.23 has changed the back end means by which jobs are scheduled; the monitoring APIs have changed, etc, etc. Where people will see a visible difference is in the JT Web UI. That's not an API-level change A failure of any code that goes into this bit of the system to compile or run against 0.23 is something people can feel slightly sorry about, but not enough to trigger reversions. What I will miss in 0.23 is the MiniMRCluster, which I consider to be part of the API. Certainly its why I pull in hadoop-common-test-0.20.20x.jar into downstream builds, because it is the simplest way to do basic tests in junit of MR operations. It's also the most lightweight way to do single-machine Hadoop runs over small datasets. +
Steve Loughran 2011-11-17, 10:45
-
Re: [DISCUSS] Apache Hadoop 1.0?Roman Shaposhnik 2011-11-17, 16:33
On Thu, Nov 17, 2011 at 2:45 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> -0.23 is a superset of the MR and HDFS APIs compatible with previous > versions (I don't know or care whether or not it is a proper superset or > not). The goal here is that end user apps and higher levels in the stack > (in-ASF and out-ASF) should work, though testing is required to verify this. I believe that by now we have enough factual evidence that at least framework-level APIs are incompatible. That's exactly why every single downstream component needs to be patched at the level of code to work against 0.23. Thanks, Roman. +
Roman Shaposhnik 2011-11-17, 16:33
-
Re: [DISCUSS] Apache Hadoop 1.0?Arun C Murthy 2011-11-17, 19:09
Roman,
On Nov 17, 2011, at 8:33 AM, Roman Shaposhnik wrote: > On Thu, Nov 17, 2011 at 2:45 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: >> -0.23 is a superset of the MR and HDFS APIs compatible with previous >> versions (I don't know or care whether or not it is a proper superset or >> not). The goal here is that end user apps and higher levels in the stack >> (in-ASF and out-ASF) should work, though testing is required to verify this. > > I believe that by now we have enough factual evidence that at least > framework-level > APIs are incompatible. Let me clarify to help you understand the distinction. Both HDFS and MR have 'framework' apis (such as details of NN/DN and JT/TT) and 'end user' apis (such as open/read/write/close or Mapper/Reducer/InputFormat/OutputFormat etc., more here: http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html). hadoop-0.23 aims to be 'compatible' for end-users so that they don't need to modify their applications to use the new release. Also, we have both the 'old MR apis' and the 'new Context Objects MR apis' in 0.23. The 'framework' apis are a different ballgame since the underlying framework, particularly in MR, has changed significantly. We have replaced the old JT/TT based 'classic' framework with the new 'yarn' framework consisting of ResourceManager/NodeManager. There are similar, but more subtle changes in the NameNode/DataNode for HDFS - then there is the append rewrite. As a result, the wire-protocols have changed significantly - as a result we are bumping up the 'major version' to reflect that. The crux of the matter: end-user applications do NOT need to be _modified_, they just have to be recompiled against the new libraries. If you do see any reason for applications to be modified please open a jira and we'll ensure we get it fixed asap. Have you seen any such instance? > That's exactly why every single downstream component > needs to be patched at the level of code to work against 0.23. Now, a downstream project such as HBase, Hive or Pig isn't the 'normal end-user application'. These projects can choose to use undocumented/non-public (e.g. LimitedPrivate) apis and we are committed to working with them to ensure a smooth transition. I don't know which are the ones in 'every single downstream component' - care to enumerate? The ones I'm aware of, which have since been fixed are: https://issues.apache.org/jira/browse/HBASE-4510 -> https://issues.apache.org/jira/browse/HDFS-2412 (we fixed the internal HDFS apis so that HBase can continue to use them) https://issues.apache.org/jira/browse/PIG-2125 -> https://issues.apache.org/jira/browse/MAPREDUCE-3138 (we fixed MR to allow apps deal with inconsistency in 'new' MR apis which changed in 0.21). I'm not aware of anything else - what else do you see? As a result, the downstream projects ensure their own end-users and applications (HBase apps, Pig scripts, Hive queries) etc. do NOT see any incompatibilities. ---- In summary, please take a careful look at the 'factual information' before you decide to proclaim your beliefs about important aspects such as 'incompatibility' - it's key to ensure we don't confuse end-users and have a smooth adoption of newer releases. thanks, Arun +
Arun C Murthy 2011-11-17, 19:09
-
Re: [DISCUSS] Apache Hadoop 1.0?Roman Shaposhnik 2011-11-17, 19:31
On Thu, Nov 17, 2011 at 11:09 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> I don't know which are the ones in 'every single downstream component' - care to enumerate? > > The ones I'm aware of, which have since been fixed are: > https://issues.apache.org/jira/browse/HBASE-4510 -> https://issues.apache.org/jira/browse/HDFS-2412 (we fixed the internal HDFS apis so that HBase can continue to use them) > https://issues.apache.org/jira/browse/PIG-2125 -> https://issues.apache.org/jira/browse/MAPREDUCE-3138 ( > we fixed MR to allow apps deal with inconsistency in 'new' MR apis which changed in 0.21). > > I'm not aware of anything else - what else do you see? > > In summary, please take a careful look at the 'factual information' before you decide to proclaim your beliefs > about important aspects such as 'incompatibility' - it's key to ensure we don't confuse end-users and have a smooth adoption of newer releases. First of all, I would appreciate if you refrain from statements that sound like you're lecturing me on public mailing list. Here's what I said. Let me spell it out once again: "I believe that by now we have enough factual evidence that at least framework-level APIs are incompatible." Here's the umbrella Bigtop JIRA that tracks those incompatibilities: https://issues.apache.org/jira/browse/BIGTOP-162 Nowhere in my email I implied that I *know* of cases where user-level APIs would break. That said, without a formal verification of backwards compatibility I can NOT make the inverse statement as well. That's why I used "at least" which according to a dictionary has connotations of "according to the lowest possible assessment; not less than". Once again, I'm in no position to asses the level of API compatibility of user-level APIs. That's where your expertise comes in. However, please do NOT tell me that I'm not in a position to make a statement about framework-level APIs where I spend a significant amount of time (together with Tom and Alejandro) patching every single downstream component (except HBase -- I didn't patch it myself) to make it at least compile against .23. Hope this helps to cut down on confusion. Thanks, Roman. +
Roman Shaposhnik 2011-11-17, 19:31
-
Re: [DISCUSS] Apache Hadoop 1.0?Steve Loughran 2011-11-21, 11:17
On 17/11/11 19:31, Roman Shaposhnik wrote:
> On Thu, Nov 17, 2011 at 11:09 AM, Arun C Murthy<[EMAIL PROTECTED]> wrote: >> I don't know which are the ones in 'every single downstream component' - care to enumerate? >> >> The ones I'm aware of, which have since been fixed are: >> https://issues.apache.org/jira/browse/HBASE-4510 -> https://issues.apache.org/jira/browse/HDFS-2412 (we fixed the internal HDFS apis so that HBase can continue to use them) >> https://issues.apache.org/jira/browse/PIG-2125 -> https://issues.apache.org/jira/browse/MAPREDUCE-3138 ( >> we fixed MR to allow apps deal with inconsistency in 'new' MR apis which changed in 0.21). >> >> I'm not aware of anything else - what else do you see? >> >> In summary, please take a careful look at the 'factual information' before you decide to proclaim your beliefs >> about important aspects such as 'incompatibility' - it's key to ensure we don't confuse end-users and have a smooth adoption of newer releases. > > First of all, I would appreciate if you refrain from statements that > sound like you're lecturing me on public mailing list. > > Here's what I said. Let me spell it out once again: > > "I believe that by now we have enough factual evidence that at least > framework-level APIs are incompatible." > > Here's the umbrella Bigtop JIRA that tracks those incompatibilities: > https://issues.apache.org/jira/browse/BIGTOP-162 > > Nowhere in my email I implied that I *know* of cases where user-level > APIs would break. That said, > without a formal verification of backwards compatibility I can NOT > make the inverse statement as well. I think we went though that discussion of formality a while back; the conclusion being without a formal specification of semantics. What tends to burn code is the implicit expectations of behaviour -stuff that was never stated to be true, but which turned out to be so for a while. I view all these bugreps as a sign of Bigtop's value to the release process -from that point of view: well done. One thing I would like to see for build and test is the 0.23+ JARs in the public mvn repository, including the test ones (without any log4j files), so that downstream projects can work with them easily. I can see the 0.23.0 SNAPSHOT artifacts in the apache repository, but given the way M2 handles such things, I'd rather stable versioned releases +
Steve Loughran 2011-11-21, 11:17
-
Re: [DISCUSS] Apache Hadoop 1.0?Andrew Purtell 2011-11-17, 21:07
> From: Arun C Murthy <[EMAIL PROTECTED]>
> Now, a downstream project such as HBase, Hive or Pig isn't the 'normal > end-user application'. These projects can choose to use > undocumented/non-public (e.g. LimitedPrivate) apis and we are committed to > working with them to ensure a smooth transition. > > I don't know which are the ones in 'every single downstream > component' - care to enumerate? > > The ones I'm aware of, which have since been fixed are: > https://issues.apache.org/jira/browse/HBASE-4510 -> > https://issues.apache.org/jira/browse/HDFS-2412 (we fixed the internal HDFS apis > so that HBase can continue to use them) > https://issues.apache.org/jira/browse/PIG-2125 -> > https://issues.apache.org/jira/browse/MAPREDUCE-3138 (we fixed MR to allow apps > deal with inconsistency in 'new' MR apis which changed in 0.21). > > I'm not aware of anything else - what else do you see? > > As a result, the downstream projects ensure their own end-users and applications > (HBase apps, Pig scripts, Hive queries) etc. do NOT see any incompatibilities. In addition, MAPREDUCE-3169. I had to pull out a bunch of HBase unit tests today to get 0.92 compiling on 0.23. - Andy +
Andrew Purtell 2011-11-17, 21:07
-
Re: [DISCUSS] Apache Hadoop 1.0?Mahadev Konar 2011-11-17, 21:12
Andrew,
Can you open a jira listing the issues? Would be good to resolve them in the next 0.23 release. thanks mahadev On Thu, Nov 17, 2011 at 1:07 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> From: Arun C Murthy <[EMAIL PROTECTED]> > >> Now, a downstream project such as HBase, Hive or Pig isn't the 'normal >> end-user application'. These projects can choose to use >> undocumented/non-public (e.g. LimitedPrivate) apis and we are committed to >> working with them to ensure a smooth transition. >> >> I don't know which are the ones in 'every single downstream >> component' - care to enumerate? >> >> The ones I'm aware of, which have since been fixed are: >> https://issues.apache.org/jira/browse/HBASE-4510 -> >> https://issues.apache.org/jira/browse/HDFS-2412 (we fixed the internal HDFS apis >> so that HBase can continue to use them) >> https://issues.apache.org/jira/browse/PIG-2125 -> >> https://issues.apache.org/jira/browse/MAPREDUCE-3138 (we fixed MR to allow apps >> deal with inconsistency in 'new' MR apis which changed in 0.21). >> >> I'm not aware of anything else - what else do you see? >> >> As a result, the downstream projects ensure their own end-users and applications >> (HBase apps, Pig scripts, Hive queries) etc. do NOT see any incompatibilities. > > > In addition, MAPREDUCE-3169. I had to pull out a bunch of HBase unit tests today to get 0.92 compiling on 0.23. > > - Andy > > +
Mahadev Konar 2011-11-17, 21:12
-
Re: [DISCUSS] Apache Hadoop 1.0?Andrew Purtell 2011-11-17, 21:28
Hi Mahadev,
> From: Mahadev Konar <[EMAIL PROTECTED]> > > Andrew, > Can you open a jira listing the issues? Would be good to resolve them > in the next 0.23 release. > [...] >> In addition, MAPREDUCE-3169. I had to pull out a bunch of HBase unit tests >> today to get 0.92 compiling on 0.23. I opened HBASE-4813 and linked it to MAPREDUCE-3169. Best regards, - Andy +
Andrew Purtell 2011-11-17, 21:28
-
Re: [DISCUSS] Apache Hadoop 1.0?Alejandro Abdelnur 2011-11-17, 19:17
On Thu, Nov 17, 2011 at 2:45 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> ... > What I will miss in 0.23 is the MiniMRCluster, which I consider to be part > of the API. Certainly its why I pull in hadoop-common-test-0.20.20x.jar into > downstream builds, because it is the simplest way to do basic tests in junit > of MR operations. It's also the most lightweight way to do single-machine > Hadoop runs over small datasets. https://issues.apache.org/jira/browse/MAPREDUCE-3169 +
Alejandro Abdelnur 2011-11-17, 19:17
|