|
Todd Papaioannou
2012-03-18, 05:41
Harsh J
2012-03-18, 06:24
Konstantin Boudnik
2012-03-18, 17:01
Todd Papaioannou
2012-03-18, 18:04
Owen O'Malley
2012-03-19, 00:07
Dhruba Borthakur
2012-03-19, 18:10
Konstantin Shvachko
2012-03-19, 19:04
Milind.Bhandarkar@...
2012-03-19, 19:34
Arun C Murthy
2012-03-19, 21:47
Doug Cutting
2012-03-19, 21:56
Milind.Bhandarkar@...
2012-03-19, 22:00
Todd Lipcon
2012-03-19, 22:38
Doug Cutting
2012-03-19, 23:18
sanjay Radia
2012-03-19, 23:24
Chris Douglas
2012-03-19, 23:28
Roman Shaposhnik
2012-03-19, 23:29
Arun C Murthy
2012-03-19, 23:38
Arun C Murthy
2012-03-19, 23:39
Chris A Mattmann
2012-03-19, 23:43
Roman Shaposhnik
2012-03-19, 23:44
sanjay Radia
2012-03-20, 00:23
Konstantin Boudnik
2012-03-20, 00:36
Konstantin Shvachko
2012-03-20, 06:02
Konstantin Shvachko
2012-03-20, 06:16
Konstantin Shvachko
2012-03-20, 06:23
Eric Baldeschwieler
2012-03-20, 06:47
Todd Lipcon
2012-03-20, 17:17
Scott Carey
2012-03-20, 18:21
Scott Carey
2012-03-20, 18:29
Konstantin Shvachko
2012-03-22, 08:53
|
-
Naming of Hadoop releasesTodd Papaioannou 2012-03-18, 05:41
All,
With the upcoming release of 0.23, isn't it about time that we started calling 0.23 "Hadoop 2.0" instead? While the numbering system may make sense to everyone here, to the rest of the world it's going to be hella confusing for 0.23 to come out after Hadoop 1.0 was released. Since 0.23 has MR2 in it I think that it would make sense to call it 2.0. Also, I think would really help with the brand awareness/perception of the project in the wider customer audience. I know there are some other potential releases out there too, so my overall suggestion would be: Current 1.X -> Remains 1.x (as new bug fix releases are released) Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and released) Current 0.23 -> Gets renamed to 2.0 Remember, a large part of the reason for renaming 0.20.xx to 1.0 was to make project progress more understandable to the rest of the world. We should ensure we don't regress with the next major release. Thoughts? ToddP
-
Re: Naming of Hadoop releasesHarsh J 2012-03-18, 06:24
Hi,
Just one concern I wanted to expresss: On Sun, Mar 18, 2012 at 11:11 AM, Todd Papaioannou <[EMAIL PROTECTED]> wrote: [Snip] > Current 1.X -> Remains 1.x (as new bug fix releases are released) > Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and released) > Current 0.23 -> Gets renamed to 2.0 [Snip] With 1.x (Like 1.1 is a release being planned, for instance, and am not sure if that will ever be the last one), moving 0.22 to 1.5 may cause confusion in future. Either we define a limit for the 1.x line, or we renumber this proposal. -- Harsh J
-
Re: Naming of Hadoop releasesKonstantin Boudnik 2012-03-18, 17:01
9.22 can't be considered as 1.5 because it is the major release from 1.0 (old
0.20.x). Besides, by declaring it as 1.5 we'll be planting future confusion of the same sort that happened around 0.20* line. And last but not least, the same discussion has happened in the past around 1.0 release time like http://is.gd/x1fVqu Cos On Sat, Mar 17, 2012 at 10:41PM, Todd Papaioannou wrote: > All, > > With the upcoming release of 0.23, isn't it about time that we started calling 0.23 "Hadoop 2.0" instead? > > While the numbering system may make sense to everyone here, to the rest of the world it's going to be hella confusing for 0.23 to come out after Hadoop 1.0 was released. Since 0.23 has MR2 in it I think that it would make sense to call it 2.0. Also, I think would really help with the brand awareness/perception of the project in the wider customer audience. > > I know there are some other potential releases out there too, so my overall suggestion would be: > > Current 1.X -> Remains 1.x (as new bug fix releases are released) > Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and released) > Current 0.23 -> Gets renamed to 2.0 > > Remember, a large part of the reason for renaming 0.20.xx to 1.0 was to make project progress more understandable to the rest of the world. We should ensure we don't regress with the next major release. > > Thoughts? > > ToddP >
-
Re: Naming of Hadoop releasesTodd Papaioannou 2012-03-18, 18:04
On Mar 18, 2012, at 10:01 AM, Konstantin Boudnik wrote:
> 9.22 can't be considered as 1.5 because it is the major release from 1.0 (old > 0.20.x). Besides, by declaring it as 1.5 we'll be planting future confusion of > the same sort that happened around 0.20* line. > > And last but not least, the same discussion has happened in the past around > 1.0 release time like http://is.gd/x1fVqu Yes I remember it well, but AFAIC there was no clear decision on 0.22 or 0.23. There were competing proposals and opinions and basically what happened was that we punted the decision on anything other than 0.20->1.0 until a later date. But, that later date is now approaching and we continue to call the current release in question 0.23. Hence my original email. Personally, I do not believe 0.22 is sufficiently major to call it 2.0 and push 0.23 to 3.0. But that's just my $0.02. I don't feel strongly enough to worry about what the outcome is. What I _do_ care strongly about is that we get some resolution and stop using 0.23 as a release name. It's confusing to the market and the customer base, and while we have made great progress in simplifying things with the 1.0 release moniker, we need to continue to make progress. ToddP > > Cos > > > On Sat, Mar 17, 2012 at 10:41PM, Todd Papaioannou wrote: >> All, >> >> With the upcoming release of 0.23, isn't it about time that we started calling 0.23 "Hadoop 2.0" instead? >> >> While the numbering system may make sense to everyone here, to the rest of the world it's going to be hella confusing for 0.23 to come out after Hadoop 1.0 was released. Since 0.23 has MR2 in it I think that it would make sense to call it 2.0. Also, I think would really help with the brand awareness/perception of the project in the wider customer audience. >> >> I know there are some other potential releases out there too, so my overall suggestion would be: >> >> Current 1.X -> Remains 1.x (as new bug fix releases are released) >> Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and released) >> Current 0.23 -> Gets renamed to 2.0 >> >> Remember, a large part of the reason for renaming 0.20.xx to 1.0 was to make project progress more understandable to the rest of the world. We should ensure we don't regress with the next major release. >> >> Thoughts? >> >> ToddP >>
-
Re: Naming of Hadoop releasesOwen O'Malley 2012-03-19, 00:07
Without working security, I don't think 0.22 should be moved out of the
0.22.x release numbering. I see that branch-0.22 has 12 fixes since 0.22.0, which was 3 months ago. My read on those numbers is that there is some adoption and therefore bug fixes, but no one is making major changes in the branch. In the same period of time, branch-0.23 has had 3447 commits and has built up a community that is quickly pushing it forward. I'd support renaming the future 0.23 releases as 2.x.y. -- Owen
-
Re: Naming of Hadoop releasesDhruba Borthakur 2012-03-19, 18:10
We vote would be to leave 0.22 as it is, and rename 0.23 as Hadoop 2.0.
-dhruba On Sun, Mar 18, 2012 at 5:07 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > Without working security, I don't think 0.22 should be moved out of the > 0.22.x release numbering. I see that branch-0.22 has 12 fixes since 0.22.0, > which was 3 months ago. My read on those numbers is that there is some > adoption and therefore bug fixes, but no one is making major changes in the > branch. > > In the same period of time, branch-0.23 has had 3447 commits and has built > up a community that is quickly pushing it forward. I'd support renaming the > future 0.23 releases as 2.x.y. > > -- Owen > -- Subscribe to my posts at http://www.facebook.com/dhruba
-
Re: Naming of Hadoop releasesKonstantin Shvachko 2012-03-19, 19:04
Hadoop naming is definitely confusing. And having Hadoop-1 did not
make it less confusing for users. > Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and released) It was released on November 29, 2011. eBay is actively using it as of today. If the goal of renaming branches is to make things less confusing about Hadoop, then I agree with people saying we should do a simultaneous rename of the branches. That is Current 0.22 -> 2 Current 0.23 -> 3 It almost sounds like release .22 does not deserve a whole number, only a fraction. But having .22 renamed to 1.5 creates a confusion that it belongs to Hadoop-1 line, which is not exactly the message we want to send out. Also I don't know what the number of commits reflects, and whether it is good or not to have many for a particular release. If the community decides to rename .22 to 2 I will be glad to work on it. Thanks, --Konstantin On Sun, Mar 18, 2012 at 11:04 AM, Todd Papaioannou <[EMAIL PROTECTED]> wrote: > On Mar 18, 2012, at 10:01 AM, Konstantin Boudnik wrote: > >> 9.22 can't be considered as 1.5 because it is the major release from 1.0 (old >> 0.20.x). Besides, by declaring it as 1.5 we'll be planting future confusion of >> the same sort that happened around 0.20* line. >> >> And last but not least, the same discussion has happened in the past around >> 1.0 release time like http://is.gd/x1fVqu > > Yes I remember it well, but AFAIC there was no clear decision on 0.22 or 0.23. There were competing proposals and opinions and basically what happened was that we punted the decision on anything other than 0.20->1.0 until a later date. But, that later date is now approaching and we continue to call the current release in question 0.23. Hence my original email. > > Personally, I do not believe 0.22 is sufficiently major to call it 2.0 and push 0.23 to 3.0. But that's just my $0.02. I don't feel strongly enough to worry about what the outcome is. > > What I _do_ care strongly about is that we get some resolution and stop using 0.23 as a release name. It's confusing to the market and the customer base, and while we have made great progress in simplifying things with the 1.0 release moniker, we need to continue to make progress. > > ToddP > > > >> >> Cos >> >> >> On Sat, Mar 17, 2012 at 10:41PM, Todd Papaioannou wrote: >>> All, >>> >>> With the upcoming release of 0.23, isn't it about time that we started calling 0.23 "Hadoop 2.0" instead? >>> >>> While the numbering system may make sense to everyone here, to the rest of the world it's going to be hella confusing for 0.23 to come out after Hadoop 1.0 was released. Since 0.23 has MR2 in it I think that it would make sense to call it 2.0. Also, I think would really help with the brand awareness/perception of the project in the wider customer audience. >>> >>> I know there are some other potential releases out there too, so my overall suggestion would be: >>> >>> Current 1.X -> Remains 1.x (as new bug fix releases are released) >>> Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and released) >>> Current 0.23 -> Gets renamed to 2.0 >>> >>> Remember, a large part of the reason for renaming 0.20.xx to 1.0 was to make project progress more understandable to the rest of the world. We should ensure we don't regress with the next major release. >>> >>> Thoughts? >>> >>> ToddP >>> >
-
Re: Naming of Hadoop releasesMilind.Bhandarkar@... 2012-03-19, 19:34
I agree with Konstantin. In previous discussion, I had suggested
simultaneous renumbering, but for some reason it was not considered. (For history buffs: I upgraded from Windows 1.0 to Windows 3.1 straight. Windows 2.0 did not have many features that made it compelling to upgrade. It did not seem odd to skip a number then, and I don't see why it would now. I also skipped Windows Vista and upgraded from XP to Windows 7, even if Vista was touted as a major release.) - Milind --- Milind Bhandarkar Chief Architect, Greenplum Labs, Data Computing Division, EMC +1-650-523-3858 (W) +1-408-666-8483 (M) On 3/19/12 12:04 PM, "Konstantin Shvachko" <[EMAIL PROTECTED]> wrote: >Hadoop naming is definitely confusing. And having Hadoop-1 did not >make it less confusing for users. > >> Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and >>released) > >It was released on November 29, 2011. >eBay is actively using it as of today. > >If the goal of renaming branches is to make things less confusing >about Hadoop, then I agree with people saying we should do a >simultaneous rename of the branches. That is >Current 0.22 -> 2 >Current 0.23 -> 3 > >It almost sounds like release .22 does not deserve a whole number, >only a fraction. But having .22 renamed to 1.5 creates a confusion >that it belongs to Hadoop-1 line, which is not exactly the message we >want to send out. >Also I don't know what the number of commits reflects, and whether it >is good or not to have many for a particular release. > >If the community decides to rename .22 to 2 I will be glad to work on it. > >Thanks, >--Konstantin > >On Sun, Mar 18, 2012 at 11:04 AM, Todd Papaioannou ><[EMAIL PROTECTED]> wrote: >> On Mar 18, 2012, at 10:01 AM, Konstantin Boudnik wrote: >> >>> 9.22 can't be considered as 1.5 because it is the major release from >>>1.0 (old >>> 0.20.x). Besides, by declaring it as 1.5 we'll be planting future >>>confusion of >>> the same sort that happened around 0.20* line. >>> >>> And last but not least, the same discussion has happened in the past >>>around >>> 1.0 release time like http://is.gd/x1fVqu >> >> Yes I remember it well, but AFAIC there was no clear decision on 0.22 >>or 0.23. There were competing proposals and opinions and basically what >>happened was that we punted the decision on anything other than >>0.20->1.0 until a later date. But, that later date is now approaching >>and we continue to call the current release in question 0.23. Hence my >>original email. >> >> Personally, I do not believe 0.22 is sufficiently major to call it 2.0 >>and push 0.23 to 3.0. But that's just my $0.02. I don't feel strongly >>enough to worry about what the outcome is. >> >> What I _do_ care strongly about is that we get some resolution and stop >>using 0.23 as a release name. It's confusing to the market and the >>customer base, and while we have made great progress in simplifying >>things with the 1.0 release moniker, we need to continue to make >>progress. >> >> ToddP >> >> >> >>> >>> Cos >>> >>> >>> On Sat, Mar 17, 2012 at 10:41PM, Todd Papaioannou wrote: >>>> All, >>>> >>>> With the upcoming release of 0.23, isn't it about time that we >>>>started calling 0.23 "Hadoop 2.0" instead? >>>> >>>> While the numbering system may make sense to everyone here, to the >>>>rest of the world it's going to be hella confusing for 0.23 to come >>>>out after Hadoop 1.0 was released. Since 0.23 has MR2 in it I think >>>>that it would make sense to call it 2.0. Also, I think would really >>>>help with the brand awareness/perception of the project in the wider >>>>customer audience. >>>> >>>> I know there are some other potential releases out there too, so my >>>>overall suggestion would be: >>>> >>>> Current 1.X -> Remains 1.x (as new bug fix releases are released) >>>> Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and >>>>released) >>>> Current 0.23 -> Gets renamed to 2.0 >>>> >>>> Remember, a large part of the reason for renaming 0.20.xx to 1.0 was
-
Re: Naming of Hadoop releasesArun C Murthy 2012-03-19, 21:47
Konstantin and Milind,
As I've noted on the other thread (my bad): > However, the problem is that hadoop-0.22 has removed public and non-deprecated apis/features (i.e. security) which are present in branch-1 (previously branch-0.20.2xx). > > This is against the Apache Hadoop release policy on major releases i.e. only features deprecated for at least one release can be removed. This is a long standing issue with branch-0.22 - are either of you planning on fixing this? If so, could you please share some roadmap/timelines? thanks, Arun On Mar 19, 2012, at 12:34 PM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: > I agree with Konstantin. In previous discussion, I had suggested > simultaneous renumbering, but for some reason it was not considered. > > (For history buffs: I upgraded from Windows 1.0 to Windows 3.1 straight. > Windows 2.0 did not have many features that made it compelling to upgrade. > It did not seem odd to skip a number then, and I don't see why it would > now. I also skipped Windows Vista and upgraded from XP to Windows 7, even > if Vista was touted as a major release.) > > - Milind > > --- > Milind Bhandarkar > Chief Architect, Greenplum Labs, Data Computing Division, EMC > +1-650-523-3858 (W) > +1-408-666-8483 (M) > > > > On 3/19/12 12:04 PM, "Konstantin Shvachko" <[EMAIL PROTECTED]> wrote: > >> Hadoop naming is definitely confusing. And having Hadoop-1 did not >> make it less confusing for users. >> >>> Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and >>> released) >> >> It was released on November 29, 2011. >> eBay is actively using it as of today. >> >> If the goal of renaming branches is to make things less confusing >> about Hadoop, then I agree with people saying we should do a >> simultaneous rename of the branches. That is >> Current 0.22 -> 2 >> Current 0.23 -> 3 >> >> It almost sounds like release .22 does not deserve a whole number, >> only a fraction. But having .22 renamed to 1.5 creates a confusion >> that it belongs to Hadoop-1 line, which is not exactly the message we >> want to send out. >> Also I don't know what the number of commits reflects, and whether it >> is good or not to have many for a particular release. >> >> If the community decides to rename .22 to 2 I will be glad to work on it. >> >> Thanks, >> --Konstantin >> >> On Sun, Mar 18, 2012 at 11:04 AM, Todd Papaioannou >> <[EMAIL PROTECTED]> wrote: >>> On Mar 18, 2012, at 10:01 AM, Konstantin Boudnik wrote: >>> >>>> 9.22 can't be considered as 1.5 because it is the major release from >>>> 1.0 (old >>>> 0.20.x). Besides, by declaring it as 1.5 we'll be planting future >>>> confusion of >>>> the same sort that happened around 0.20* line. >>>> >>>> And last but not least, the same discussion has happened in the past >>>> around >>>> 1.0 release time like http://is.gd/x1fVqu >>> >>> Yes I remember it well, but AFAIC there was no clear decision on 0.22 >>> or 0.23. There were competing proposals and opinions and basically what >>> happened was that we punted the decision on anything other than >>> 0.20->1.0 until a later date. But, that later date is now approaching >>> and we continue to call the current release in question 0.23. Hence my >>> original email. >>> >>> Personally, I do not believe 0.22 is sufficiently major to call it 2.0 >>> and push 0.23 to 3.0. But that's just my $0.02. I don't feel strongly >>> enough to worry about what the outcome is. >>> >>> What I _do_ care strongly about is that we get some resolution and stop >>> using 0.23 as a release name. It's confusing to the market and the >>> customer base, and while we have made great progress in simplifying >>> things with the 1.0 release moniker, we need to continue to make >>> progress. >>> >>> ToddP >>> >>> >>> >>>> >>>> Cos >>>> >>>> >>>> On Sat, Mar 17, 2012 at 10:41PM, Todd Papaioannou wrote: >>>>> All, >>>>> >>>>> With the upcoming release of 0.23, isn't it about time that we >>>>> started calling 0.23 "Hadoop 2.0" instead? Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
-
Re: Naming of Hadoop releasesDoug Cutting 2012-03-19, 21:56
On 03/19/2012 02:47 PM, Arun C Murthy wrote:
> This is against the Apache Hadoop release policy on major releases i.e. only features deprecated for at least one release can be removed. In many case the reason this happened was that features were backported from trunk to 0.20 but not to 0.22. In other words, its no fault of the folks who were working on branch 0.22. So a related policy we might add to prevent such situations in the future might be that if you backport something from branch n to n-2 then you ought to also be required to backport it to branch n-1 and in general to all intervening branches. Does that seem sensible? Doug
-
Re: Naming of Hadoop releasesMilind.Bhandarkar@... 2012-03-19, 22:00
Arun,
As Konstantin has noted in the email below: > If the community decides to rename .22 to 2 I will be glad to work on it. My inclination (as I have communicated to several people at apachecon) is to upgrade our clusters from 1.0 to 0.23 (whatever it is called when it becomes stable). The reason for this is the pain of backporting patches to 0.22 from trunk (because of mavenization.) This does not mean that 0.22 is abandoned. Other than eBay, I know of two sizeable deployments of 0.22 in universities. And other than removing LinuxTaskController (which those folks apparently do not care about), there are many more features in 0.22 that they *do* care about. - Milind On 3/19/12 2:47 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: >Konstantin and Milind, > > As I've noted on the other thread (my bad): > >> However, the problem is that hadoop-0.22 has removed public and >>non-deprecated apis/features (i.e. security) which are present in >>branch-1 (previously branch-0.20.2xx). >> >> This is against the Apache Hadoop release policy on major releases i.e. >>only features deprecated for at least one release can be removed. > >This is a long standing issue with branch-0.22 - are either of you >planning on fixing this? If so, could you please share some >roadmap/timelines? > >thanks, >Arun > >On Mar 19, 2012, at 12:34 PM, <[EMAIL PROTECTED]> ><[EMAIL PROTECTED]> wrote: > >> I agree with Konstantin. In previous discussion, I had suggested >> simultaneous renumbering, but for some reason it was not considered. >> >> (For history buffs: I upgraded from Windows 1.0 to Windows 3.1 straight. >> Windows 2.0 did not have many features that made it compelling to >>upgrade. >> It did not seem odd to skip a number then, and I don't see why it would >> now. I also skipped Windows Vista and upgraded from XP to Windows 7, >>even >> if Vista was touted as a major release.) >> >> - Milind >> >> --- >> Milind Bhandarkar >> Chief Architect, Greenplum Labs, Data Computing Division, EMC >> +1-650-523-3858 (W) >> +1-408-666-8483 (M) >> >> >> >> On 3/19/12 12:04 PM, "Konstantin Shvachko" <[EMAIL PROTECTED]> wrote: >> >>> Hadoop naming is definitely confusing. And having Hadoop-1 did not >>> make it less confusing for users. >>> >>>> Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and >>>> released) >>> >>> It was released on November 29, 2011. >>> eBay is actively using it as of today. >>> >>> If the goal of renaming branches is to make things less confusing >>> about Hadoop, then I agree with people saying we should do a >>> simultaneous rename of the branches. That is >>> Current 0.22 -> 2 >>> Current 0.23 -> 3 >>> >>> It almost sounds like release .22 does not deserve a whole number, >>> only a fraction. But having .22 renamed to 1.5 creates a confusion >>> that it belongs to Hadoop-1 line, which is not exactly the message we >>> want to send out. >>> Also I don't know what the number of commits reflects, and whether it >>> is good or not to have many for a particular release. >>> >>> If the community decides to rename .22 to 2 I will be glad to work on >>>it. >>> >>> Thanks, >>> --Konstantin >>> >>> On Sun, Mar 18, 2012 at 11:04 AM, Todd Papaioannou >>> <[EMAIL PROTECTED]> wrote: >>>> On Mar 18, 2012, at 10:01 AM, Konstantin Boudnik wrote: >>>> >>>>> 9.22 can't be considered as 1.5 because it is the major release from >>>>> 1.0 (old >>>>> 0.20.x). Besides, by declaring it as 1.5 we'll be planting future >>>>> confusion of >>>>> the same sort that happened around 0.20* line. >>>>> >>>>> And last but not least, the same discussion has happened in the past >>>>> around >>>>> 1.0 release time like http://is.gd/x1fVqu >>>> >>>> Yes I remember it well, but AFAIC there was no clear decision on 0.22 >>>> or 0.23. There were competing proposals and opinions and basically >>>>what >>>> happened was that we punted the decision on anything other than >>>> 0.20->1.0 until a later date. But, that later date is now approaching
-
Re: Naming of Hadoop releasesTodd Lipcon 2012-03-19, 22:38
On Mon, Mar 19, 2012 at 2:56 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> On 03/19/2012 02:47 PM, Arun C Murthy wrote: >> This is against the Apache Hadoop release policy on major releases i.e. only features deprecated for at least one release can be removed. > > In many case the reason this happened was that features were backported > from trunk to 0.20 but not to 0.22. In other words, its no fault of the > folks who were working on branch 0.22. I agree that it's no fault of the folks on 0.22. > So a related policy we might add > to prevent such situations in the future might be that if you backport > something from branch n to n-2 then you ought to also be required to > backport it to branch n-1 and in general to all intervening branches. > Does that seem sensible? -1 on this requirement. Otherwise the cost of backporting something to the stable line becomes really high, and we'll end up with distributors just maintaining their own branches outside of Apache (the state we were in with 0.20.x). On the other hand, it does suck for users if they update from "1.x" to "2.x" and they end up losing some bug fixes or features they previously were running. Unfortunately, I don't have a better solution in mind that resolves the above problems - I just don't think it's tenable to combine a policy like "anyone may make a release branch off trunk and claim a major version number" with another policy like "you have to port a fix to all intermediate versions in order to port a fix to any of them". If a group of committers wants to make a release branch, then the maintenance of that branch should be up to them. -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: Naming of Hadoop releasesDoug Cutting 2012-03-19, 23:18
On 03/19/2012 03:38 PM, Todd Lipcon wrote:
> Unfortunately, I don't have a better solution in mind that resolves > the above problems - I just don't think it's tenable to combine a > policy like "anyone may make a release branch off trunk and claim a > major version number" with another policy like "you have to port a fix > to all intermediate versions in order to port a fix to any of them". > If a group of committers wants to make a release branch, then the > maintenance of that branch should be up to them. I don't think it's the case that anyone can create a new major version number. Creation of new release branches should be an activity of the PMC, not individuals. As such, the majority of the PMC implicitly or explicitly approves such branches and the PMC must responsibly deal with the ensuing results. We should not operate as a fragmented community creating a fragmented, overlapping set of products. As I recall, when the 0.22 release branch was created it was intended by the PMC to be a release branch that followed 0.20 and preceded 0.23. Since then we as a PMC have acted inconsistently and now must deal with the consequences. We've already made an exception to our policies in releasing 0.20.20x and 0.22 that regresses from it in some areas. We now need to decide whether we want to continue that exception by renaming 0.22 to 2.0 or not. It does not look like we'll reach consensus on this. That's unfortunate, but we still need to answer it. We also should decide whether we want to permit ourselves to get in this pinch again. I think it's avoidable if in the future we only make releases that are consistent with our other policies. Backports should be easier for intervening releases. We might reasonably grandfather 0.22 as skippable for backports as it's too late to fix that now. Doug
-
Re: Naming of Hadoop releasessanjay Radia 2012-03-19, 23:24
On Mar 19, 2012, at 12:04 PM, Konstantin Shvachko wrote: > > >> Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and released) > > It was released on November 29, 2011. > eBay is actively using it as of today. > Konstantine, For a release to be truly viable, it has to be deployed in production and followed on with bug fixes over time. Otherwise the release may not stabilize and it becomes confusing for users. Is EBay already using it in *production* (you had used the words "actively" and perhaps you meant "production") and are you planning to do bug fixes and periodic releases? On the other hand, is EBay planning to instead fork 0.22 and maintain an internal release for custom improvements? sanjay
-
Re: Naming of Hadoop releasesChris Douglas 2012-03-19, 23:28
-1. I agree with Todd; we tried this policy before and the project
didn't produce a usable release for two years. Its benefits are fiction and its harm is documented. However 0.22 is (or isn't) released, no general policy is required and nobody should waste their time trying to define one. Releases- including versions- are by majority vote. Either the developers of the 0.22 series convince most of the PMC that the release series warrants a major version, they elect to continue development on the 0.22 series, or they fork the code and create a new project. Those are always the only outcomes and the reasoning will be ad hoc by definition. My opinion: version numbers are cheap. As long as 0.22 has contributors interested in pursuing that line of development, reserving a series for that work to be released is not unreasonable. Confining it to 0.22.xxx presumes it will fail, while a major version should give its maintainers sufficient flexibility to define compatibility, etc. -C On Mon, Mar 19, 2012 at 2:56 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > On 03/19/2012 02:47 PM, Arun C Murthy wrote: >> This is against the Apache Hadoop release policy on major releases i.e. only features deprecated for at least one release can be removed. > > In many case the reason this happened was that features were backported > from trunk to 0.20 but not to 0.22. In other words, its no fault of the > folks who were working on branch 0.22. So a related policy we might add > to prevent such situations in the future might be that if you backport > something from branch n to n-2 then you ought to also be required to > backport it to branch n-1 and in general to all intervening branches. > Does that seem sensible? > > Doug
-
Re: Naming of Hadoop releasesRoman Shaposhnik 2012-03-19, 23:29
On Mon, Mar 19, 2012 at 3:38 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> On the other hand, it does suck for users if they update from "1.x" to > "2.x" and they end up losing some bug fixes or features they > previously were running. Keep in mind that nobody's proposing to rename .22 branch into 2 right away. This will be addressed at a later point during one of the (hopefully) regular releases off of that branch. Reserving branch-2 version slot is cheap and IMHO doesn't complicate anything. It does, however, give .22 branch a right to mature. Thanks, Roman.
-
Re: Naming of Hadoop releasesArun C Murthy 2012-03-19, 23:38
On Mar 19, 2012, at 4:18 PM, Doug Cutting wrote: > > We also should decide whether we want to permit ourselves to get in this > pinch again. I think it's avoidable if in the future we only make > releases that are consistent with our other policies. Backports should > be easier for intervening releases. We might reasonably grandfather > 0.22 as skippable for backports as it's too late to fix that now. Agree. 0.22 & security is somewhat of a special case: # hadoop-0.20.203 (with security) was released in May 2011. # Folks working on branch-0.22 abandoned security much later in-spite of advise against. Thus, there has been sufficient opportunity to fix security in branch-0.22 since it's been repeatedly pointed out. Again, as I mentioned previously, please note that none of this is meant to discourage folks on 0.22 or further releases off the branch. It's just hard to call it hadoop-2 given it's current state or known roadmap. So, let me ask again - is there anyone willing to step up and fix security in branch-0.22? If not, and I haven't seen evidence to the contrary for a very long time now, IMHO this discussion is moot. IAC, at this point everyone is sufficiently entrenched and I don't expect that anyone will be swayed anymore, so we could just call a vote with the options discussed and decide. thanks, Arun
-
Re: Naming of Hadoop releasesArun C Murthy 2012-03-19, 23:39
Roman & Milind: asking again - would you guys be willing to step up and fix 0.22 to be more reasonable as a hadoop-2 candidate i.e. fix/validate security?
On Mar 19, 2012, at 4:29 PM, Roman Shaposhnik wrote: > On Mon, Mar 19, 2012 at 3:38 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >> On the other hand, it does suck for users if they update from "1.x" to >> "2.x" and they end up losing some bug fixes or features they >> previously were running. > > Keep in mind that nobody's proposing to rename .22 branch into 2 right > away. This will be addressed at a later point during one of the (hopefully) > regular releases off of that branch. > > Reserving branch-2 version slot is cheap and IMHO doesn't complicate > anything. It does, however, give .22 branch a right to mature. > > Thanks, > Roman. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
-
Re: Naming of Hadoop releasesChris A Mattmann 2012-03-19, 23:43
On Mar 20, 2012, at 12:28 AM, Chris Douglas wrote:
> -1. I agree with Todd; we tried this policy before and the project > didn't produce a usable release for two years. Its benefits are > fiction and its harm is documented. > > However 0.22 is (or isn't) released, no general policy is required and > nobody should waste their time trying to define one. Releases- > including versions- are by majority vote. Either the developers of the > 0.22 series convince most of the PMC that the release series warrants > a major version, they elect to continue development on the 0.22 > series, or they fork the code and create a new project. Those are > always the only outcomes and the reasoning will be ad hoc by > definition. > > My opinion: version numbers are cheap. As long as 0.22 has > contributors interested in pursuing that line of development, > reserving a series for that work to be released is not unreasonable. > Confining it to 0.22.xxx presumes it will fail, while a major version > should give its maintainers sufficient flexibility to define > compatibility, etc. -C Well stated, Chris, +1 (non-binding) from me. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: Naming of Hadoop releasesRoman Shaposhnik 2012-03-19, 23:44
Arun, let me answer both of your questions in the same reply:
On Mon, Mar 19, 2012 at 4:38 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > So, let me ask again - is there anyone willing to step up and fix security in branch-0.22? > > If not, and I haven't seen evidence to the contrary for a very long time now, IMHO this discussion is moot. Which discussion? The one that I'm interested in is a very simple issue -- should we reserve the version slot 2.X.Y or not. That has very little to do with the current state of security. > Roman & Milind: asking again - would you guys be willing to step up and fix 0.22 to be more > reasonable as a hadoop-2 candidate i.e. fix/validate security? Sure. I might end up doing some work on within the Bigtop framework. That said, I can't really give you the deadline as of now. Hence my desire to separate these 2 discussions. Thanks, Roman.
-
Re: Naming of Hadoop releasessanjay Radia 2012-03-20, 00:23
On Mar 19, 2012, at 4:28 PM, Chris Douglas wrote: > However 0.22 is (or isn't) released, no general policy is required and > nobody should waste their time trying to define one. Releases- > including versions- are by majority vote. Either the developers of the > 0.22 series convince most of the PMC that the release series warrants > a major version, they elect to continue development on the 0.22 > series, or they fork the code and create a new project. Those are > always the only outcomes and the reasoning will be ad hoc by > definition. > > My opinion: version numbers are cheap. As long as 0.22 has > contributors interested in pursuing that line of development, > reserving a series for that work to be released is not unreasonable. > Confining it to 0.22.xxx presumes it will fail, while a major version > should give its maintainers sufficient flexibility to define > compatibility, etc. -C Well put chris.
-
Re: Naming of Hadoop releasesKonstantin Boudnik 2012-03-20, 00:36
On Mon, Mar 19, 2012 at 12:04PM, Konstantin Shvachko wrote:
> Hadoop naming is definitely confusing. And having Hadoop-1 did not > make it less confusing for users. > > > Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and released) > > It was released on November 29, 2011. > eBay is actively using it as of today. > > If the goal of renaming branches is to make things less confusing > about Hadoop, then I agree with people saying we should do a > simultaneous rename of the branches. That is > Current 0.22 -> 2 > Current 0.23 -> 3 > > It almost sounds like release .22 does not deserve a whole number, > only a fraction. But having .22 renamed to 1.5 creates a confusion > that it belongs to Hadoop-1 line, which is not exactly the message we > want to send out. > Also I don't know what the number of commits reflects, and whether it > is good or not to have many for a particular release. > > If the community decides to rename .22 to 2 I will be glad to work on it. Count me in Cos > Thanks, > --Konstantin > > On Sun, Mar 18, 2012 at 11:04 AM, Todd Papaioannou > <[EMAIL PROTECTED]> wrote: > > On Mar 18, 2012, at 10:01 AM, Konstantin Boudnik wrote: > > > >> 9.22 can't be considered as 1.5 because it is the major release from 1.0 (old > >> 0.20.x). Besides, by declaring it as 1.5 we'll be planting future confusion of > >> the same sort that happened around 0.20* line. > >> > >> And last but not least, the same ═discussion has happened in the past around > >> 1.0 release time like http://is.gd/x1fVqu > > > > Yes I remember it well, but AFAIC there was no clear decision on 0.22 or 0.23. There were competing proposals and opinions and basically what happened was that we punted the decision on anything other than 0.20->1.0 until a later date. But, that later date is now approaching and we continue to call the current release in question 0.23. Hence my original email. > > > > Personally, I do not believe 0.22 is sufficiently major to call it 2.0 and push 0.23 to 3.0. But that's just my $0.02. I don't feel strongly enough to worry about what the outcome is. > > > > What I _do_ care strongly about is that we get some resolution and stop using 0.23 as a release name. It's confusing to the market and the customer base, and while we have made great progress in simplifying things with the 1.0 release moniker, we need to continue to make progress. > > > > ToddP > > > > > > > >> > >> Cos > >> > >> > >> On Sat, Mar 17, 2012 at 10:41PM, Todd Papaioannou wrote: > >>> All, > >>> > >>> With the upcoming release of 0.23, isn't it about time that we started calling 0.23 "Hadoop 2.0" instead? > >>> > >>> While the numbering system may make sense to everyone here, to the rest of the world it's going to be hella confusing for 0.23 to come out after Hadoop 1.0 was released. Since 0.23 has MR2 in it I think that it would make sense to call it 2.0. Also, I think would really help with the brand awareness/perception of the project in the wider customer audience. > >>> > >>> I know there are some other potential releases out there too, so my overall suggestion would be: > >>> > >>> Current 1.X -> Remains 1.x (as new bug fix releases are released) > >>> Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and released) > >>> Current 0.23 -> Gets renamed to 2.0 > >>> > >>> Remember, a large part of the reason for renaming 0.20.xx to 1.0 was to make project progress more understandable to the rest of the world. We should ensure we don't regress with the next major release. > >>> > >>> Thoughts? > >>> > >>> ToddP > >>> > >
-
Re: Naming of Hadoop releasesKonstantin Shvachko 2012-03-20, 06:02
<Doug>
> to prevent such situations in the future might be that if you backport > something from branch n to n-2 then you ought to also be required to > backport it to branch n-1 and in general to all intervening branches. This is imo the most important topic in the discussion. I support Doug's proposal, because it provides forward-moving evolution of the project, with releases being driven by the necessity to introduce new features, so that we could avoid back- and forward-porting overhead, which exhausts the community resources. <Arun> > This is against the Apache Hadoop release policy on major releases i.e. > only features deprecated for at least one release can be removed. Not sure if this is the Apache Hadoop release policy, but we as PMC were inconsistent in allowing decisions to implement new features in old releases, namely the 0.20 series, instead of creating new releases with those new features. This is the reason why security and other good features are not in 0.22. Feature freeze has been broken so many times for the .20 branch, so that it became a norm for the entire project rather than an exception, which we had in the past. I don't understand this constant segregation against Hadoop .22. It is a perfectly usable version of Hadoop. It would be waste not to have it released. Very glad that universities adopted it. If somebody needs security there is a number of choices, Hadoop-1 being the first. But if you cannot afford stand-alone HBase clusters or need to combine general Hadoop and HBase loads there is nothing else but Hadoop 0.22 at this point. When .23 is stable I will be glad to use it. But the steady stream of feature ports makes it hard to decide how stable it is and to predict when it is ready. I am advocating to stop porting features and start releasing them. If .23 is Federation + Yarn, then 0.23 + HA is 0.24; plus PB - going to 0.25, etc. Thought I should clarify what I mean by forward-going progress. Hope it makes sense. Thanks, --Konstantin On Mon, Mar 19, 2012 at 2:56 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > On 03/19/2012 02:47 PM, Arun C Murthy wrote: >> This is against the Apache Hadoop release policy on major releases i.e. only features deprecated for at least one release can be removed. > > In many case the reason this happened was that features were backported > from trunk to 0.20 but not to 0.22. In other words, its no fault of the > folks who were working on branch 0.22. So a related policy we might add > to prevent such situations in the future might be that if you backport > something from branch n to n-2 then you ought to also be required to > backport it to branch n-1 and in general to all intervening branches. > Does that seem sensible? > > Doug
-
Re: Naming of Hadoop releasesKonstantin Shvachko 2012-03-20, 06:16
> This is a long standing issue with branch-0.22 - are either of you planning on fixing this?
I personally do not have plans to fix security in .22. I don't think we should target it. I hope 0.23 will be a replacement for it by summer. Is it still in your roadmap, Arun? I also don't think that this should be a requirement for renaming the release, at least I haven't seen anything about it in the Apache Hadoop policies. > could you please share some roadmap/timelines? I did discuss my roadmap with my managers. Sorry don't have anything to share. Thanks, --Konstantin On Mon, Mar 19, 2012 at 2:47 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > Konstantin and Milind, > > As I've noted on the other thread (my bad): > >> However, the problem is that hadoop-0.22 has removed public and non-deprecated apis/features (i.e. security) which are present in branch-1 (previously branch-0.20.2xx). >> >> This is against the Apache Hadoop release policy on major releases i.e. only features deprecated for at least one release can be removed. > > This is a long standing issue with branch-0.22 - are either of you planning on fixing this? If so, could you please share some roadmap/timelines? > > thanks, > Arun > > On Mar 19, 2012, at 12:34 PM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: > >> I agree with Konstantin. In previous discussion, I had suggested >> simultaneous renumbering, but for some reason it was not considered. >> >> (For history buffs: I upgraded from Windows 1.0 to Windows 3.1 straight. >> Windows 2.0 did not have many features that made it compelling to upgrade. >> It did not seem odd to skip a number then, and I don't see why it would >> now. I also skipped Windows Vista and upgraded from XP to Windows 7, even >> if Vista was touted as a major release.) >> >> - Milind >> >> --- >> Milind Bhandarkar >> Chief Architect, Greenplum Labs, Data Computing Division, EMC >> +1-650-523-3858 (W) >> +1-408-666-8483 (M) >> >> >> >> On 3/19/12 12:04 PM, "Konstantin Shvachko" <[EMAIL PROTECTED]> wrote: >> >>> Hadoop naming is definitely confusing. And having Hadoop-1 did not >>> make it less confusing for users. >>> >>>> Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and >>>> released) >>> >>> It was released on November 29, 2011. >>> eBay is actively using it as of today. >>> >>> If the goal of renaming branches is to make things less confusing >>> about Hadoop, then I agree with people saying we should do a >>> simultaneous rename of the branches. That is >>> Current 0.22 -> 2 >>> Current 0.23 -> 3 >>> >>> It almost sounds like release .22 does not deserve a whole number, >>> only a fraction. But having .22 renamed to 1.5 creates a confusion >>> that it belongs to Hadoop-1 line, which is not exactly the message we >>> want to send out. >>> Also I don't know what the number of commits reflects, and whether it >>> is good or not to have many for a particular release. >>> >>> If the community decides to rename .22 to 2 I will be glad to work on it. >>> >>> Thanks, >>> --Konstantin >>> >>> On Sun, Mar 18, 2012 at 11:04 AM, Todd Papaioannou >>> <[EMAIL PROTECTED]> wrote: >>>> On Mar 18, 2012, at 10:01 AM, Konstantin Boudnik wrote: >>>> >>>>> 9.22 can't be considered as 1.5 because it is the major release from >>>>> 1.0 (old >>>>> 0.20.x). Besides, by declaring it as 1.5 we'll be planting future >>>>> confusion of >>>>> the same sort that happened around 0.20* line. >>>>> >>>>> And last but not least, the same discussion has happened in the past >>>>> around >>>>> 1.0 release time like http://is.gd/x1fVqu >>>> >>>> Yes I remember it well, but AFAIC there was no clear decision on 0.22 >>>> or 0.23. There were competing proposals and opinions and basically what >>>> happened was that we punted the decision on anything other than >>>> 0.20->1.0 until a later date. But, that later date is now approaching >>>> and we continue to call the current release in question 0.23. Hence my >>>> original email. >>>> >>
-
Re: Naming of Hadoop releasesKonstantin Shvachko 2012-03-20, 06:23
Sanjay,
Yes I plan to continue fixing bugs as long as I use the branch, and release it if the need arise. I hope there won't be many required with 0.23 progressing as planned. Thanks, --Konstantin On Mon, Mar 19, 2012 at 4:24 PM, sanjay Radia <[EMAIL PROTECTED]> wrote: > > On Mar 19, 2012, at 12:04 PM, Konstantin Shvachko wrote: > > Current 0.22 -> Gets renamed to 1.5 (if it ever gets tested and released) > > > It was released on November 29, 2011. > eBay is actively using it as of today. > > > Konstantine, > > For a release to be truly viable, it has to be deployed in production and > followed on with > bug fixes over time. Otherwise the release may not stabilize and it becomes > confusing for users. > > Is EBay already using it in *production* (you had used the words "actively" > and perhaps you meant > "production") and are you planning to do bug fixes and periodic releases? > On the other hand, is EBay planning to instead fork 0.22 and maintain an > internal release for custom improvements? > > > sanjay >
-
Re: Naming of Hadoop releasesEric Baldeschwieler 2012-03-20, 06:47
Lots of good stuff on this thread. Todd, Chris and Todd have made great points. (+1)
Doug, I think you have misdiagnosed the problem (in your comment below). IMO the problem at the time of the creation of the 0.20.2xx was that the Hadoop community had not produced a stable release for years and none of 0.21, 0.22 or 0.23 were converging quickly to a stable release. At the time, we had a lot of debate and concluded that 0.20.2xx was very much in the spirit of Apache. We concluded that any committer is free to create any branch and can call a vote to release it if they choose. Looking back, this open process has allowed us to make a lot of progress! We've created a bit of naming messiness for sure, but let's look at the gains. The 0.20.2xx release is good enough that the PMC chose to promote it to Hadoop 1.0. Further the 0.22 line is now good enough that ebay runs it in production and 0.23 continues to make great forward progress. By allowing different contributors to pursue different agendas within the community, we have successfully produced stable releases and collected a lot of work within Apache Hadoop that was previously forced to live in private patch sets and branches. We now may (or may not) agree on new naming for the other releases. In the end, the community is better off with more choices and more progress. This seems like a perfect blend of meritocracy and democracy. Let's not discourage back porting and other contributions. We have release masters, patch reviews and feedback to stop bad ideas and votes to choose between competing visions of what comes next. Let's have fewer bylaws and allow messiness when needed. E14 On Mar 19, 2012, at 4:18 PM, Doug Cutting wrote: > On 03/19/2012 03:38 PM, Todd Lipcon wrote: >> Unfortunately, I don't have a better solution in mind that resolves >> the above problems - I just don't think it's tenable to combine a >> policy like "anyone may make a release branch off trunk and claim a >> major version number" with another policy like "you have to port a fix >> to all intermediate versions in order to port a fix to any of them". >> If a group of committers wants to make a release branch, then the >> maintenance of that branch should be up to them. > > I don't think it's the case that anyone can create a new major version > number. Creation of new release branches should be an activity of the > PMC, not individuals. As such, the majority of the PMC implicitly or > explicitly approves such branches and the PMC must responsibly deal with > the ensuing results. We should not operate as a fragmented community > creating a fragmented, overlapping set of products. > > As I recall, when the 0.22 release branch was created it was intended by > the PMC to be a release branch that followed 0.20 and preceded 0.23. > Since then we as a PMC have acted inconsistently and now must deal with > the consequences. > > We've already made an exception to our policies in releasing 0.20.20x > and 0.22 that regresses from it in some areas. We now need to decide > whether we want to continue that exception by renaming 0.22 to 2.0 or > not. It does not look like we'll reach consensus on this. That's > unfortunate, but we still need to answer it. > > We also should decide whether we want to permit ourselves to get in this > pinch again. I think it's avoidable if in the future we only make > releases that are consistent with our other policies. Backports should > be easier for intervening releases. We might reasonably grandfather > 0.22 as skippable for backports as it's too late to fix that now. > > Doug
-
Re: Naming of Hadoop releasesTodd Lipcon 2012-03-20, 17:17
On Mon, Mar 19, 2012 at 11:02 PM, Konstantin Shvachko
<[EMAIL PROTECTED]> wrote: > Feature freeze has been broken so many times for the .20 branch, so > that it became a norm for the entire project rather than an exception, > which we had in the past. I agree we should be stricter about what feature backports we allow into "stable" branches. Security and hflush were both necessary evils - I'm glad now that we have them, but we should try to stay out of these types of situations in the future where we feel compelled to backport (or re-do in the case of hflush/sync) such large items. > > I don't understand this constant segregation against Hadoop .22. It is > a perfectly usable version of Hadoop. It would be waste not to have it > released. Very glad that universities adopted it. If somebody needs > security there is a number of choices, Hadoop-1 being the first. But > if you cannot afford stand-alone HBase clusters or need to combine > general Hadoop and HBase loads there is nothing else but Hadoop 0.22 > at this point. I don't see what HBase has to do with it. In fact HBase runs way better on 1.x compared to 0.22. The tests don't even pass on 0.22 due to differences in the append semantics in 0.21+ compared to 0.20. Every production HBase deploy I know about runs on an 1.x based distribution. You could argue this is selection bias by nature of my employer, but the same is true based on emails to the hbase-user lists, etc. This is orthogonal to the discussion at hand, I just wanted to correct this lest any users get the wrong perception and migrate their HBase clusters to a version which is rarely used and strictly inferior for this use case. -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: Naming of Hadoop releasesScott Carey 2012-03-20, 18:21
On 3/19/12 3:38 PM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: > >> So a related policy we might add >> to prevent such situations in the future might be that if you backport >> something from branch n to n-2 then you ought to also be required to >> backport it to branch n-1 and in general to all intervening branches. >> Does that seem sensible? > >-1 on this requirement. Otherwise the cost of backporting something to >the stable line becomes really high, and we'll end up with >distributors just maintaining their own branches outside of Apache >(the state we were in with 0.20.x). > >On the other hand, it does suck for users if they update from "1.x" to >"2.x" and they end up losing some bug fixes or features they >previously were running. Users should not expect upgrades from one major version to another to be trivial. Documentation on feature differences and upgrade paths are expected for major changes. It would be really bad if you went from 1.0.1 to 1.0.2 and lost security. Upgrading from 1.x to 2.x and gaining/losing features is a non-issue IMO. A major version bump indicates major differences. These will necessarily mean dropping of old features over time. Rules on major versions about feature supersets or deprecate-first requirements end up being meaningless over time (the former is not tenable, the latter easy to circumvent with a short lived release). It is saner to define what a major/minor/patch version meanings are for releases by default and then for each specific release note if that differs from the norm. I propse: * Major version increment: significant API or backwards compatibility affecting changes may be present. See documentation for details. * Minor version increment: API changes likely. New features typically added. Removed features documented. Expectation of backwards compatibility unless otherwise noted in the release. * Patch version increment: fully backwards compatible in API and Operation. Bug fixes and minor features that I don't see any logic at all as to why 0.22 can't be 2.0 -- a major version number change indicates major differences. If one of those differences is that security is not present, so what? There cannot be any restriction on what can change across major number bumps. _Any conceivable restriction_ to what can change in a major release is flawed. At some point, that feature or API may need to expire.
-
Re: Naming of Hadoop releasesScott Carey 2012-03-20, 18:29
On 3/19/12 11:02 PM, "Konstantin Shvachko" <[EMAIL PROTECTED]> wrote: ><Doug> >> to prevent such situations in the future might be that if you backport >> something from branch n to n-2 then you ought to also be required to >> backport it to branch n-1 and in general to all intervening branches. > >This is imo the most important topic in the discussion. >I support Doug's proposal, because it provides forward-moving >evolution of the project, >with releases being driven by the necessity to introduce new features, >so that we could avoid back- and forward-porting overhead, which >exhausts the community resources. I believe this is untenable. You cannot guarantee that Hadoop 11.x will have all the features as Hadoop 3.x. As such, a backport from 11.x to 2.x for some reason should not imply porting all the way down the chain. One cannot foresee which of those intermediate versions are still active or live in advance. When a major version number changes, all bets are off. The release may completely overhaul an API, or it may not. Assumptions of linear progress break down. Perhaps such a rule for branches within a major release line make sense where one can reasonably expect to be able to maintain some expectation of linear progress. However not all major versions will have such an assumption, and the same issues will apply. The more difficult you make it for an organization to share its work with the community (i.e. create a branch) the more likely they will work on it on the side and not in the community. > ><Arun> >> This is against the Apache Hadoop release policy on major releases i.e. >> only features deprecated for at least one release can be removed. > >Not sure if this is the Apache Hadoop release policy, but >we as PMC were inconsistent in allowing decisions to implement new >features in old releases, namely the 0.20 series, instead of creating >new releases with those new features. This is the reason why security >and other good features are not in 0.22. >Feature freeze has been broken so many times for the .20 branch, so >that it became a norm for the entire project rather than an exception, >which we had in the past. > >I don't understand this constant segregation against Hadoop .22. It is >a perfectly usable version of Hadoop. It would be waste not to have it >released. Very glad that universities adopted it. If somebody needs >security there is a number of choices, Hadoop-1 being the first. But >if you cannot afford stand-alone HBase clusters or need to combine >general Hadoop and HBase loads there is nothing else but Hadoop 0.22 >at this point. > >When .23 is stable I will be glad to use it. But the steady stream of >feature ports makes it hard to decide how stable it is and to predict >when it is ready. >I am advocating to stop porting features and start releasing them. >If .23 is Federation + Yarn, then 0.23 + HA is 0.24; plus PB - going >to 0.25, etc. > >Thought I should clarify what I mean by forward-going progress. >Hope it makes sense. > >Thanks, >--Konstantin > > >On Mon, Mar 19, 2012 at 2:56 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >> On 03/19/2012 02:47 PM, Arun C Murthy wrote: >>> This is against the Apache Hadoop release policy on major releases >>>i.e. only features deprecated for at least one release can be removed. >> >> In many case the reason this happened was that features were backported >> from trunk to 0.20 but not to 0.22. In other words, its no fault of the >> folks who were working on branch 0.22. So a related policy we might add >> to prevent such situations in the future might be that if you backport >> something from branch n to n-2 then you ought to also be required to >> backport it to branch n-1 and in general to all intervening branches. >> Does that seem sensible? >> >> Doug
-
Re: Naming of Hadoop releasesKonstantin Shvachko 2012-03-22, 08:53
Hi everybody,
I think it is important that people vote on the Arun's proposal. - Whether it is binding or not. - Whether you think your vote does not effect the results. - We have a huge user base,and should be hearing from you. - It would be very useful to understand your opinions or confusions about renaming of the branches, indicated by voting. - I am particularly interested in the value of the 0.22 branch for the user community. I believe there is life on the outskirts of the Universe with the restaurant at the End of it. It is good to now the actual scale. Thanks, --Konstantin |