|
Owen O'Malley
2010-03-15, 16:06
Jeff Hammerbacher
2010-03-15, 21:03
Allen Wittenauer
2010-03-24, 18:38
Brian Bockelman
2010-03-24, 20:27
Tom White
2010-03-24, 23:25
Jeff Hammerbacher
2010-03-25, 01:17
Steve Loughran
2010-03-25, 13:22
Konstantin Boudnik
2010-03-26, 18:13
Owen O'Malley
2010-03-26, 18:43
Stack
2010-03-26, 19:10
Chris Douglas
2010-03-27, 00:03
Tom White
2010-03-27, 00:26
Steve Loughran
2010-03-29, 16:23
Doug Cutting
2010-03-30, 22:40
Chris K Wensel
2010-03-30, 23:04
Owen O'Malley
2010-03-31, 03:22
Cosmin Lehene
2010-03-31, 09:38
Allen Wittenauer
2010-03-31, 11:42
Doug Cutting
2010-03-31, 16:04
Doug Cutting
2010-03-31, 16:06
Konstantin Shvachko
2010-03-31, 17:13
Tom White
2010-03-31, 18:44
Amr Awadallah
2010-03-31, 20:34
Doug Cutting
2010-03-31, 21:19
Konstantin Shvachko
2010-04-01, 01:29
Andrew Purtell
2010-04-01, 16:33
Chris K Wensel
2010-04-01, 17:11
Doug Cutting
2010-04-01, 17:18
Jay Booth
2010-04-01, 17:36
Chris Douglas
2010-04-01, 17:38
Doug Cutting
2010-04-01, 17:50
Mattmann, Chris A
2010-04-01, 18:24
Todd Lipcon
2010-04-01, 18:26
Doug Cutting
2010-04-01, 18:44
Chris Douglas
2010-04-01, 20:59
Allen Wittenauer
2010-04-01, 21:31
Mattmann, Chris A
2010-04-01, 21:36
Doug Cutting
2010-04-01, 21:38
Chris Douglas
2010-04-02, 02:23
Amr Awadallah
2010-04-02, 04:05
Dhruba Borthakur
2010-04-02, 04:31
Owen O'Malley
2010-04-02, 05:33
Daniel Templeton
2010-04-02, 13:52
Doug Cutting
2010-04-02, 16:09
Doug Cutting
2010-04-02, 17:08
Chris Douglas
2010-04-05, 19:04
Chris K Wensel
2010-04-05, 21:16
Chris Douglas
2010-04-05, 21:54
Chris K Wensel
2010-04-06, 00:06
Allen Wittenauer
2010-04-06, 01:19
Steve Loughran
2010-04-06, 12:55
Steve Loughran
2010-04-06, 13:02
Allen Wittenauer
2010-04-06, 16:00
Doug Cutting
2010-04-06, 19:05
Chris Douglas
2010-04-06, 21:08
|
-
[DISCUSSION] Release processOwen O'Malley 2010-03-15, 16:06
From our 21 experience, it looks like our old release strategy is
failing. In looking around, I found that HTTPD's release strategy is extremely different and seems much more likely to produce usable releases. It is well worth reading, in my opinion. http://httpd.apache.org/dev/release.html -- Owen
-
Re: [DISCUSSION] Release processJeff Hammerbacher 2010-03-15, 21:03
Hey Owen,
Which aspects of the HTTPD release strategy do you find most useful compared to the current Hadoop release strategy? Thanks, Jeff On Mon, Mar 15, 2010 at 8:06 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > From our 21 experience, it looks like our old release strategy is failing. > In looking around, I found that HTTPD's release strategy is extremely > different and seems much more likely to produce usable releases. It is well > worth reading, in my opinion. > > http://httpd.apache.org/dev/release.html > > -- Owen >
-
Re: [DISCUSSION] Release processAllen Wittenauer 2010-03-24, 18:38
On 3/15/10 9:06 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:
> From our 21 experience, it looks like our old release strategy is > failing. Maybe this is a dumb question but... Are we sure it isn't the community failing? From where I stand, the major committers (PMC?) have essentially forked Hadoop into three competing source trees. No one appears to be dedicated to helping the community release because the focus is on their own tree. Worse yet, two of these trees are publicly available with both sides pushing their own tree as vastly superior (against each other and against the official Apache branded one). What are the next steps in getting this resolved? Is Hadoop-as-we-know-it essentially dead? What is going to prevent the fiasco that is 0.21 from impacting 0.22? For me personally, I'm more amused than upset that 0.21 hasn't been released. But I'm less happy that there appears to be a focus on feature additions rather than getting some of the 0.21 blockers settled (I'm assuming here that most of the 0.21 blockers apply to 0.22 as well). I don't think retroactively declaring 0.20 as 1.0 is going to make the situation any better. [In fact, I believe it will make it worse, since it gives an external impression that 0.20 is somehow stable at all levels. We all know this isn't true.]
-
Re: [DISCUSSION] Release processBrian Bockelman 2010-03-24, 20:27
Hey Allen,
Your post provoked a few thoughts: 1) Hadoop is a large, but relatively immature project (as in, there's still a lot of major features coming down the pipe). If we wait to release on features, especially when there are critical bugs, we end up with a large number of patches between releases. This ends up encouraging custom patch sets and custom distributions. 2) The barrier for patch acceptance is high, especially for opportunistic developers. This is a good thing for code quality, but for getting patches in a timely manner. This means that there are a lot of 'mostly good' patches out there in JIRA which have not landed. This again encourages folks to develop their own custom patch sets. 3) We make only bugfixes for past minor releases, meaning the stable Apache release is perpetually behind in features, even features that are not core. Not sure how to best fix these things. One possibility: a) Have a stable/unstable series (0.19.x is unstable, 0.20.x is stable, 0.21.x is unstable). For the unstable releases, lower the bar for code acceptance for less-risky patches. b) Combined with a a time-based release for bugfixes (and non-dangerous features?) in order to keep the feature releases "fresh". (a) aims to tackle problems (1) and (2). (b) aims to tackle (3). This might not work for everything. If I had a goal, it would be to decrease the number of active distributions from 3 to 2 - otherwise you end up spending far too much time consensus building. Just a thought from an outside, relatively content observer, Brian On Mar 24, 2010, at 1:38 PM, Allen Wittenauer wrote: > On 3/15/10 9:06 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: >> From our 21 experience, it looks like our old release strategy is >> failing. > > Maybe this is a dumb question but... Are we sure it isn't the community > failing? > > From where I stand, the major committers (PMC?) have essentially forked > Hadoop into three competing source trees. No one appears to be dedicated to > helping the community release because the focus is on their own tree. Worse > yet, two of these trees are publicly available with both sides pushing their > own tree as vastly superior (against each other and against the official > Apache branded one). > > What are the next steps in getting this resolved? Is > Hadoop-as-we-know-it essentially dead? What is going to prevent the fiasco > that is 0.21 from impacting 0.22? > > For me personally, I'm more amused than upset that 0.21 hasn't been > released. But I'm less happy that there appears to be a focus on feature > additions rather than getting some of the 0.21 blockers settled (I'm > assuming here that most of the 0.21 blockers apply to 0.22 as well). > > I don't think retroactively declaring 0.20 as 1.0 is going to make the > situation any better. [In fact, I believe it will make it worse, since it > gives an external impression that 0.20 is somehow stable at all levels. We > all know this isn't true.]
-
Re: [DISCUSSION] Release processTom White 2010-03-24, 23:25
I agree that getting the release process restarted is of utmost
importance to the project. To help make that happen I'm happy to volunteer to be a release manager for the next release. This will be the first release post-split, so there will undoubtedly be some issues to work out. I think the focus should be on getting an alpha release out, so I suggest we create a new 0.21 branch from trunk, then spend time fixing blockers (which will be a superset of the existing 0.21 blockers). Cheers, Tom On Wed, Mar 24, 2010 at 1:27 PM, Brian Bockelman <[EMAIL PROTECTED]> wrote: > Hey Allen, > > Your post provoked a few thoughts: > 1) Hadoop is a large, but relatively immature project (as in, there's still a lot of major features coming down the pipe). If we wait to release on features, especially when there are critical bugs, we end up with a large number of patches between releases. This ends up encouraging custom patch sets and custom distributions. > 2) The barrier for patch acceptance is high, especially for opportunistic developers. This is a good thing for code quality, but for getting patches in a timely manner. This means that there are a lot of 'mostly good' patches out there in JIRA which have not landed. This again encourages folks to develop their own custom patch sets. > 3) We make only bugfixes for past minor releases, meaning the stable Apache release is perpetually behind in features, even features that are not core. > > Not sure how to best fix these things. One possibility: > a) Have a stable/unstable series (0.19.x is unstable, 0.20.x is stable, 0.21.x is unstable). For the unstable releases, lower the bar for code acceptance for less-risky patches. > b) Combined with a a time-based release for bugfixes (and non-dangerous features?) in order to keep the feature releases "fresh". > > (a) aims to tackle problems (1) and (2). (b) aims to tackle (3). > > This might not work for everything. If I had a goal, it would be to decrease the number of active distributions from 3 to 2 - otherwise you end up spending far too much time consensus building. > > Just a thought from an outside, relatively content observer, > > Brian > > On Mar 24, 2010, at 1:38 PM, Allen Wittenauer wrote: > >> On 3/15/10 9:06 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: >>> From our 21 experience, it looks like our old release strategy is >>> failing. >> >> Maybe this is a dumb question but... Are we sure it isn't the community >> failing? >> >> From where I stand, the major committers (PMC?) have essentially forked >> Hadoop into three competing source trees. No one appears to be dedicated to >> helping the community release because the focus is on their own tree. Worse >> yet, two of these trees are publicly available with both sides pushing their >> own tree as vastly superior (against each other and against the official >> Apache branded one). >> >> What are the next steps in getting this resolved? Is >> Hadoop-as-we-know-it essentially dead? What is going to prevent the fiasco >> that is 0.21 from impacting 0.22? >> >> For me personally, I'm more amused than upset that 0.21 hasn't been >> released. But I'm less happy that there appears to be a focus on feature >> additions rather than getting some of the 0.21 blockers settled (I'm >> assuming here that most of the 0.21 blockers apply to 0.22 as well). >> >> I don't think retroactively declaring 0.20 as 1.0 is going to make the >> situation any better. [In fact, I believe it will make it worse, since it >> gives an external impression that 0.20 is somehow stable at all levels. We >> all know this isn't true.] > >
-
Re: [DISCUSSION] Release processJeff Hammerbacher 2010-03-25, 01:17
Hey Tom,
That sounds like a great idea. +1. Thanks, Jeff On Wed, Mar 24, 2010 at 4:25 PM, Tom White <[EMAIL PROTECTED]> wrote: > I agree that getting the release process restarted is of utmost > importance to the project. To help make that happen I'm happy to > volunteer to be a release manager for the next release. This will be > the first release post-split, so there will undoubtedly be some issues > to work out. I think the focus should be on getting an alpha release > out, so I suggest we create a new 0.21 branch from trunk, then spend > time fixing blockers (which will be a superset of the existing 0.21 > blockers). > > Cheers, > Tom > > On Wed, Mar 24, 2010 at 1:27 PM, Brian Bockelman <[EMAIL PROTECTED]> > wrote: > > Hey Allen, > > > > Your post provoked a few thoughts: > > 1) Hadoop is a large, but relatively immature project (as in, there's > still a lot of major features coming down the pipe). If we wait to release > on features, especially when there are critical bugs, we end up with a large > number of patches between releases. This ends up encouraging custom patch > sets and custom distributions. > > 2) The barrier for patch acceptance is high, especially for opportunistic > developers. This is a good thing for code quality, but for getting patches > in a timely manner. This means that there are a lot of 'mostly good' > patches out there in JIRA which have not landed. This again encourages > folks to develop their own custom patch sets. > > 3) We make only bugfixes for past minor releases, meaning the stable > Apache release is perpetually behind in features, even features that are not > core. > > > > Not sure how to best fix these things. One possibility: > > a) Have a stable/unstable series (0.19.x is unstable, 0.20.x is stable, > 0.21.x is unstable). For the unstable releases, lower the bar for code > acceptance for less-risky patches. > > b) Combined with a a time-based release for bugfixes (and non-dangerous > features?) in order to keep the feature releases "fresh". > > > > (a) aims to tackle problems (1) and (2). (b) aims to tackle (3). > > > > This might not work for everything. If I had a goal, it would be to > decrease the number of active distributions from 3 to 2 - otherwise you end > up spending far too much time consensus building. > > > > Just a thought from an outside, relatively content observer, > > > > Brian > > > > On Mar 24, 2010, at 1:38 PM, Allen Wittenauer wrote: > > > >> On 3/15/10 9:06 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: > >>> From our 21 experience, it looks like our old release strategy is > >>> failing. > >> > >> Maybe this is a dumb question but... Are we sure it isn't the > community > >> failing? > >> > >> From where I stand, the major committers (PMC?) have essentially > forked > >> Hadoop into three competing source trees. No one appears to be > dedicated to > >> helping the community release because the focus is on their own tree. > Worse > >> yet, two of these trees are publicly available with both sides pushing > their > >> own tree as vastly superior (against each other and against the official > >> Apache branded one). > >> > >> What are the next steps in getting this resolved? Is > >> Hadoop-as-we-know-it essentially dead? What is going to prevent the > fiasco > >> that is 0.21 from impacting 0.22? > >> > >> For me personally, I'm more amused than upset that 0.21 hasn't been > >> released. But I'm less happy that there appears to be a focus on > feature > >> additions rather than getting some of the 0.21 blockers settled (I'm > >> assuming here that most of the 0.21 blockers apply to 0.22 as well). > >> > >> I don't think retroactively declaring 0.20 as 1.0 is going to make > the > >> situation any better. [In fact, I believe it will make it worse, since > it > >> gives an external impression that 0.20 is somehow stable at all levels. > We > >> all know this isn't true.] > > > > >
-
Re: [DISCUSSION] Release processSteve Loughran 2010-03-25, 13:22
Tom White wrote:
> I agree that getting the release process restarted is of utmost > importance to the project. To help make that happen I'm happy to > volunteer to be a release manager for the next release. This will be > the first release post-split, so there will undoubtedly be some issues > to work out. I think the focus should be on getting an alpha release > out, so I suggest we create a new 0.21 branch from trunk, then spend > time fixing blockers (which will be a superset of the existing 0.21 > blockers). > > Cheers, > Tom My thoughts * The installed base creates its own inertia: if you have 2PB of data you care about, you don't want to be bleeding edge. * That installed base creates resistance to getting patches back in. I think everyone -myself included -has stuff they want to get into the system, but everyone who doesn't see the need for a feature is nervous. * the branches reassure people of stability, but increase the cost of changes and fixes too. There's more pressure to backport stuff, this makes big reorgs hard. * It makes takeup of new features (like the new FS apis) harder. You have to consider how long 0.20.x will stay around, so focus on the stuff that's there. * I worry that the partioning of the project is making inertia worse too, harder to co-ordinate changes across the code, and the code is still tightly coupled enough that matters. My suggestons +1 to the idea of stable/unstable, though I'd like to get some stuff into 0.22, and with avro and security, that's going to be pretty traumatic too. the move from 0.20 to 0.21 will be much less painful +1 to Tom being release manager. +1 to some session where everyone brings their patches up to date and we push them into trunk, to bring the branches back in line. If you want to do some session in the bay area then perhaps those of us outside it can skype in or IPC, do a distributed triage run and really work hard to get stuff in and working. We should do this before cutting the 0.21 branch, I will sort my stuff out asap Now, what cluster time -real and virtual- to people have to offer Tom? I may -repeat may- be able to sort out some OpenCirrus machines, or transient VMs with limited per-VM storage in my little 1000 node datacentre, though networking complexity gets in the way there. He won't have direct access to it. -Steve > > On Wed, Mar 24, 2010 at 1:27 PM, Brian Bockelman <[EMAIL PROTECTED]> wrote: >> Hey Allen, >> >> Your post provoked a few thoughts: >> 1) Hadoop is a large, but relatively immature project (as in, there's still a lot of major features coming down the pipe). If we wait to release on features, especially when there are critical bugs, we end up with a large number of patches between releases. This ends up encouraging custom patch sets and custom distributions. >> 2) The barrier for patch acceptance is high, especially for opportunistic developers. This is a good thing for code quality, but for getting patches in a timely manner. This means that there are a lot of 'mostly good' patches out there in JIRA which have not landed. This again encourages folks to develop their own custom patch sets. >> 3) We make only bugfixes for past minor releases, meaning the stable Apache release is perpetually behind in features, even features that are not core. >> >> Not sure how to best fix these things. One possibility: >> a) Have a stable/unstable series (0.19.x is unstable, 0.20.x is stable, 0.21.x is unstable). For the unstable releases, lower the bar for code acceptance for less-risky patches. >> b) Combined with a a time-based release for bugfixes (and non-dangerous features?) in order to keep the feature releases "fresh". >> >> (a) aims to tackle problems (1) and (2). (b) aims to tackle (3). >> >> This might not work for everything. If I had a goal, it would be to decrease the number of active distributions from 3 to 2 - otherwise you end up spending far too much time consensus building. >> >> Just a thought from an outside, relatively content observer,
-
Re: [DISCUSSION] Release processKonstantin Boudnik 2010-03-26, 18:13
On Wed, Mar 24, 2010 at 01:27PM, Brian Bockelman wrote:
> a) Have a stable/unstable series (0.19.x is unstable, 0.20.x is stable, 0.21.x is unstable). For the unstable releases, lower the bar for code acceptance for less-risky patches. I can see how the different criteria of patch acceptance might be in incentive for different patch sets between unstable and stable releases. Thus, features will have to be manually tracked and ported between releases. Cos > b) Combined with a a time-based release for bugfixes (and non-dangerous features?) in order to keep the feature releases "fresh". > > (a) aims to tackle problems (1) and (2). (b) aims to tackle (3). > > This might not work for everything. If I had a goal, it would be to decrease the number of active distributions from 3 to 2 - otherwise you end up spending far too much time consensus building. > > Just a thought from an outside, relatively content observer, > > Brian > > On Mar 24, 2010, at 1:38 PM, Allen Wittenauer wrote: > > > On 3/15/10 9:06 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: > >> From our 21 experience, it looks like our old release strategy is > >> failing. > > > > Maybe this is a dumb question but... Are we sure it isn't the community > > failing? > > > > From where I stand, the major committers (PMC?) have essentially forked > > Hadoop into three competing source trees. No one appears to be dedicated to > > helping the community release because the focus is on their own tree. Worse > > yet, two of these trees are publicly available with both sides pushing their > > own tree as vastly superior (against each other and against the official > > Apache branded one). > > > > What are the next steps in getting this resolved? Is > > Hadoop-as-we-know-it essentially dead? What is going to prevent the fiasco > > that is 0.21 from impacting 0.22? > > > > For me personally, I'm more amused than upset that 0.21 hasn't been > > released. But I'm less happy that there appears to be a focus on feature > > additions rather than getting some of the 0.21 blockers settled (I'm > > assuming here that most of the 0.21 blockers apply to 0.22 as well). > > > > I don't think retroactively declaring 0.20 as 1.0 is going to make the > > situation any better. [In fact, I believe it will make it worse, since it > > gives an external impression that 0.20 is somehow stable at all levels. We > > all know this isn't true.] >
-
Re: [DISCUSSION] Release processOwen O'Malley 2010-03-26, 18:43
On Mar 24, 2010, at 4:25 PM, Tom White wrote: > I agree that getting the release process restarted is of utmost > importance to the project. To help make that happen I'm happy to > volunteer to be a release manager for the next release. This will be > the first release post-split, so there will undoubtedly be some issues > to work out. I think the focus should be on getting an alpha release > out, so I suggest we create a new 0.21 branch from trunk, then spend > time fixing blockers (which will be a superset of the existing 0.21 > blockers). That's great, Tom. Thanks for stepping up. Given that you're proposing rebasing 0.21, what is your preferred strategy? Are you going to pick a feature freeze date? Or propose a more httpd-like process where you cut the branch and then control what goes in? Also note that the security work is not done in trunk. That isn't a blocker of course, but I don't any one to have unrealistic expectations that a rebased 0.21 would include all of the security work. (The work is done in our Yahoo 0.20.100 branch that we should push to github soon. We will also be forward porting the patches into trunk over the next month.) Thanks, Owen
-
Re: [DISCUSSION] Release processStack 2010-03-26, 19:10
Getting a release out is critical. Otherwise, IMO, the project is
dead but for the stiffening. Thanks Tom for stepping up to play the RM role for a 0.21. Regarding Steve's call for what we can offer Tom to help along the release, the little flea hbase can test its use case on 0.21.0 candidates and we can probably take on a few of the HDFS blockers. I also like Steve's suggestion of a council to figure what makes the 0.21 cut (We're talking security and avro in 0.22, not 0.21 right?). Allen in his note raises another issue beyond the release blockage that I believe warrants further discussion. The "forks" maintained by the big contributors currently cloud (undermine?) the Apache release and the amount and pain involved patch wrangling is a friction on forward progress especially as versions deviate further. Perhaps this state is inevitable when the stakes are this high, where there are new releases rolled out across thousands of machines carrying biz-critical data that cannot fail. Having the Apache project release reliably on a schedule should help especially if posted fixes get reviewed and committed. Formally adopting the stable/unstable labeling could help too. I'd be interested in more discussion of the latter point and in what the community thinks we can do to address the current, IMO, stasis-making set of circumstances. Thanks, St.Ack On Thu, Mar 25, 2010 at 6:22 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: > Tom White wrote: >> >> I agree that getting the release process restarted is of utmost >> importance to the project. To help make that happen I'm happy to >> volunteer to be a release manager for the next release. This will be >> the first release post-split, so there will undoubtedly be some issues >> to work out. I think the focus should be on getting an alpha release >> out, so I suggest we create a new 0.21 branch from trunk, then spend >> time fixing blockers (which will be a superset of the existing 0.21 >> blockers). >> >> Cheers, >> Tom > > My thoughts > > * The installed base creates its own inertia: if you have 2PB of data you > care about, you don't want to be bleeding edge. > * That installed base creates resistance to getting patches back in. I think > everyone -myself included -has stuff they want to get into the system, but > everyone who doesn't see the need for a feature is nervous. > * the branches reassure people of stability, but increase the cost of > changes and fixes too. There's more pressure to backport stuff, this makes > big reorgs hard. > * It makes takeup of new features (like the new FS apis) harder. You have to > consider how long 0.20.x will stay around, so focus on the stuff that's > there. > * I worry that the partioning of the project is making inertia worse too, > harder to co-ordinate changes across the code, and the code is still tightly > coupled enough that matters. > > My suggestons > +1 to the idea of stable/unstable, though I'd like to get some stuff into > 0.22, and with avro and security, that's going to be pretty traumatic too. > the move from 0.20 to 0.21 will be much less painful > +1 to Tom being release manager. > +1 to some session where everyone brings their patches up to date and we > push them into trunk, to bring the branches back in line. If you want to do > some session in the bay area then perhaps those of us outside it can skype > in or IPC, do a distributed triage run and really work hard to get stuff in > and working. We should do this before cutting the 0.21 branch, I will sort > my stuff out asap > > Now, what cluster time -real and virtual- to people have to offer Tom? I may > -repeat may- be able to sort out some OpenCirrus machines, or transient VMs > with limited per-VM storage in my little 1000 node datacentre, though > networking complexity gets in the way there. He won't have direct access to > it. > > -Steve > >> >> On Wed, Mar 24, 2010 at 1:27 PM, Brian Bockelman <[EMAIL PROTECTED]> >> wrote: >>> >>> Hey Allen, >>> >>> Your post provoked a few thoughts:
-
Re: [DISCUSSION] Release processChris Douglas 2010-03-27, 00:03
> Thanks Tom for stepping up to play the RM role for a 0.21.
+1 Thanks Tom. > Regarding Steve's call for what we can offer Tom to help along the > release, the little flea hbase can test its use case on 0.21.0 > candidates and we can probably take on a few of the HDFS blockers. I > also like Steve's suggestion of a council to figure what makes the > 0.21 cut (We're talking security and avro in 0.22, not 0.21 right?). A council may not move quickly enough to make 0.21 real on a reasonable timeframe. We need to vote on the rules we're going to follow, but one attribute of the httpd model- giving the RM considerable authority over what's in/out of a release- sounds like an efficient way to effect a quick release of this long-suffering branch. We also need to vote on backwards compatibility requirements for 0.21, whatever form it takes (rebase or existing), since most seem to be regarding it as unstable or not-quite-major. > Allen in his note raises another issue beyond the release blockage > that I believe warrants further discussion. The "forks" maintained by > the big contributors currently cloud (undermine?) the Apache release > and the amount and pain involved patch wrangling is a friction on > forward progress especially as versions deviate further. Perhaps this > state is inevitable when the stakes are this high, where there are new > releases rolled out across thousands of machines carrying biz-critical > data that cannot fail. The Apache Hadoop community needs to have some honest discussions about its priorities. The branches maintained and published by large contributors are not harmful in principle, but the distance from Apache not only imposes a burden on those involved in development, but it can also fracture the user base. It should be a goal to minimize the delta between distributions in the core framework and encourage all players to stay current with the Apache project. > Having the Apache project release reliably on > a schedule should help especially if posted fixes get reviewed and > committed. Formally adopting the stable/unstable labeling could help > too. A fixed schedule is unrealistic. Six month releases just cause pain for whoever is testing it, and if nobody is motivated to push the branch to release (as in 0.21), then committers and contributors pay the branch overhead purposelessly while users are perpetually confused on its status. The httpd model- where anyone can call a release, but trunk is unaffected- seems fair. It also keeps all parties focused on keeping trunk stable enough to release on their own criteria. But we should discuss "future" separately from 0.21. -C
-
Re: [DISCUSSION] Release processTom White 2010-03-27, 00:26
On Fri, Mar 26, 2010 at 11:43 AM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
> > On Mar 24, 2010, at 4:25 PM, Tom White wrote: > >> I agree that getting the release process restarted is of utmost >> importance to the project. To help make that happen I'm happy to >> volunteer to be a release manager for the next release. This will be >> the first release post-split, so there will undoubtedly be some issues >> to work out. I think the focus should be on getting an alpha release >> out, so I suggest we create a new 0.21 branch from trunk, then spend >> time fixing blockers (which will be a superset of the existing 0.21 >> blockers). > > That's great, Tom. Thanks for stepping up. Given that you're proposing > rebasing 0.21, what is your preferred strategy? Are you going to pick a > feature freeze date? Or propose a more httpd-like process where you cut the > branch and then control what goes in? A bit of both: I was thinking about creating a new branch soon, on a feature freeze date (in a couple of weeks or so, ideally), then deciding what the blockers are for the release. Some of the issues currently marked as 0.21 blockers may not actually block the release (e.g. a documentation improvement), and there will be other issues that will turn out to be blockers. For example, we need to be confident that the 0.21 API is compatible with the 0.20 API, so getting good JDiff output and some compatibility tests (MAPREDUCE-1637) are important IMO. > > Also note that the security work is not done in trunk. That isn't a blocker > of course, but I don't any one to have unrealistic expectations that a > rebased 0.21 would include all of the security work. (The work is done in > our Yahoo 0.20.100 branch that we should push to github soon. We will also > be forward porting the patches into trunk over the next month.) Thanks for pointing this out, Owen. One of my motivations for doing this is to exercise the post-split release process, so this should be seen as an alpha release, where some features (like security) may be incomplete or potentially unstable. And thanks for the offers to do testing (Steve and Stack), that's very helpful. I suggest we have a wiki page or similar so that folks can record the tests they ran and the cluster details for a release candidate. Cheers, Tom > > Thanks, > Owen >
-
Re: [DISCUSSION] Release processSteve Loughran 2010-03-29, 16:23
Stack wrote:
> Getting a release out is critical. Otherwise, IMO, the project is > dead but for the stiffening. > > Thanks Tom for stepping up to play the RM role for a 0.21. > > Regarding Steve's call for what we can offer Tom to help along the > release, the little flea hbase can test its use case on 0.21.0 > candidates and we can probably take on a few of the HDFS blockers. I > also like Steve's suggestion of a council to figure what makes the > 0.21 cut (We're talking security and avro in 0.22, not 0.21 right?). > > Allen in his note raises another issue beyond the release blockage > that I believe warrants further discussion. The "forks" maintained by > the big contributors currently cloud (undermine?) the Apache release > and the amount and pain involved patch wrangling is a friction on > forward progress especially as versions deviate further. Perhaps this > state is inevitable when the stakes are this high, where there are new > releases rolled out across thousands of machines carrying biz-critical > data that cannot fail. Having the Apache project release reliably on > a schedule should help especially if posted fixes get reviewed and > committed. Formally adopting the stable/unstable labeling could help > too. The ASF never mandates a release schedule; can be too hard to meet. The main thing is "making progress" and having some plan to release updated versions. But if you don't have frequent releases, the pressure to backport increases. -steve
-
Re: [DISCUSSION] Release processDoug Cutting 2010-03-30, 22:40
Tom White wrote:
> I think the focus should be on getting an alpha release > out, so I suggest we create a new 0.21 branch from trunk Another release we might consider is 1.0 based on 0.20. We'd then have releases that correspond to what folks are actually using in production. This would also rationalize our release numbering, since many have expressed that 0.20 APIs should be treated as 1.0 APIs. A 1.0 release based off 0.20 would give us a chance to state more precisely the 1.0 API that we intend to support long-term. For example, we might un-mark the old mapreduce APIs as deprecated in a 1.0 release, and mark the new mapreduce APIs as experimental and unstable there. Programs that use only public stable features in 1.0 could be then guaranteed to run for a long-time hence. It would also be good to get HDFS-200 into 1.0. That might be the fastest route to providing a stable append for HBase. Y!'s 0.20+security could become the basis of a 1.1 release. The next release from trunk might then be called 2.0 alpha. It would support 1.0 APIs, but they'd be deprecated in favor of newer API for mapreduce and filesystems. We could pursue releasing 1.0 and 2.0 alpha in parallel. Thoughts? Doug
-
Re: [DISCUSSION] Release processChris K Wensel 2010-03-30, 23:04
> A 1.0 release based off 0.20 would give us a chance to state more precisely the 1.0 API that we intend to support long-term. For example, we might un-mark the old mapreduce APIs as deprecated in a 1.0 release, and mark the new mapreduce APIs as experimental and unstable there. Programs that use only public stable features in 1.0 could be then guaranteed to run for a long-time hence.
> +1 ckw -- Chris K Wensel [EMAIL PROTECTED] http://www.concurrentinc.com
-
Re: [DISCUSSION] Release processOwen O'Malley 2010-03-31, 03:22
On Mar 30, 2010, at 3:40 PM, Doug Cutting wrote: > Another release we might consider is 1.0 based on 0.20. It is tempting and I think that 0.20 is *really* our 1.0, but I think re-labeling a release a year after it came out would be confusing. I think that we should change the rules so that the remaining 0.X releases are minor releases. That seems a relatively minor change and just means that we can't remove deprecated items until 1.0. I'll volunteer to be release manager for a release branched in November, which should be roughly 6 months after Tom's new 0.21 release. One possible release numbering would have the November release be 0.99 and a matching 1.0 that removes all of the deprecated methods. -- Owen
-
Re: [DISCUSSION] Release processCosmin Lehene 2010-03-31, 09:38
Hi,
I'm glad we're heading towards a release. We'd like to better understand some aspects regarding the release plan. What would be the tentative release schedule, and what affects particular releases? We could either continue with our current version or plan based on what's going to be released. I guess an upgrade decision process is mostly influenced by the amount of work required to update and test existing code for API changes balanced by the features benefits from the upgrade, with aspects such as Data Integrity being blockers. Hence, we'd like to understand how the following things map on, or affect, the next releases (0.x, 1.x, 2.x) * Data Integrity (I guess this should be blocking for any release) * API compatibility (I understand this is the primary driver for the release major numbering) * High level features(Append, Security, Avro, Rolling upgrades, etc. - this would map on a time basis) In our case we have code running on 0.21 and need "Append", with the current focus being Data Integrity. We can handle an API change, but really concerned about Data Integrity. So, for us, fixing any blocking issues on hdfs-0.21 (potentially 0.22) and then further maintaining it stable would be the priority, regardless whether this will be 0.X, 1.X, or 2.X. Depending on the result of this release schedule we might be able to stick with one of them or to fork for a while. Thanks for your help, Cosmin On Mar 31, 2010, at 6:22 AM, Owen O'Malley wrote: > > On Mar 30, 2010, at 3:40 PM, Doug Cutting wrote: > >> Another release we might consider is 1.0 based on 0.20. > > It is tempting and I think that 0.20 is *really* our 1.0, but I think > re-labeling a release a year after it came out would be confusing. > > I think that we should change the rules so that the remaining 0.X > releases are minor releases. That seems a relatively minor change and > just means that we can't remove deprecated items until 1.0. > > I'll volunteer to be release manager for a release branched in > November, which should be roughly 6 months after Tom's new 0.21 release. > > One possible release numbering would have the November release be 0.99 > and a matching 1.0 that removes all of the deprecated methods. > > -- Owen >
-
Re: [DISCUSSION] Release processAllen Wittenauer 2010-03-31, 11:42
On 3/30/10 8:22 PM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: > > On Mar 30, 2010, at 3:40 PM, Doug Cutting wrote: > >> Another release we might consider is 1.0 based on 0.20. > > It is tempting and I think that 0.20 is *really* our 1.0, but I think > re-labeling a release a year after it came out would be confusing. By "our" do you mean Yahoo!'s or Apache's? The fact that there are a *ton* of admin tool changes/fixes/additions in the Yahoo! Distribution of 0.20 (and quite a few in CDH) should be the big hint that Apache 0.20 is *not* 1.0. I think it would do the PMC well to actually play with Apache 0.20.2 and realize how much is missing vs. their in-house distributions.
-
Re: [DISCUSSION] Release processDoug Cutting 2010-03-31, 16:04
Owen O'Malley wrote:
> It is tempting and I think that 0.20 is *really* our 1.0, but I think > re-labeling a release a year after it came out would be confusing. I wasn't proposing just a re-labeling. I was proposing a new release, branched from 0.20 rather than trunk. We'd introduce some changes, after voting on each of course. Candidates are MAPREDUCE-1623 and MAPREDUCE-1650, to better clarify what's intended to be supported in 1.0, and HDFS-200, to make append reliable. Since we have not yet made a 0.21 release, this numbering would be consistent. It also naturally permits further 1.x releases that add features, like security. Doug
-
Re: [DISCUSSION] Release processDoug Cutting 2010-03-31, 16:06
Allen Wittenauer wrote:
> The fact that there are a *ton* > of admin tool changes/fixes/additions in the Yahoo! Distribution of 0.20 > (and quite a few in CDH) should be the big hint that Apache 0.20 is *not* > 1.0. Right. I'm proposing we make a 1.0 release that tries to match what folks are actually using in production and clarifies what APIs may be relied upon to be stable going forward. Doug
-
Re: [DISCUSSION] Release processKonstantin Shvachko 2010-03-31, 17:13
HDFS 0.20 does not have a reliable append.
Also it is (was last time I looked) incompatible with the 0.21 append HDFS-256. That wouldn't be a problem if that was the only incompatibility. But it's not. If 1.0 is re-labeled or re-branched from 0.20 we will have to many incompatibilities going into further releases so that we will have to call all of them major ones for the foreseeable future. I don't understand what is wrong with 0.21 released from 0.21? - Making a new release from trunk will take long time to stabilize. - Branching out 0.20.x as 1.0 introduces too many incompatibilities. I would like to propose a straightforward release of 0.21 from current 0.21 branch. --Konstantin On 3/31/2010 9:04 AM, Doug Cutting wrote: > Owen O'Malley wrote: >> It is tempting and I think that 0.20 is *really* our 1.0, but I think >> re-labeling a release a year after it came out would be confusing. > > I wasn't proposing just a re-labeling. I was proposing a new release, > branched from 0.20 rather than trunk. We'd introduce some changes, after > voting on each of course. Candidates are MAPREDUCE-1623 and > MAPREDUCE-1650, to better clarify what's intended to be supported in > 1.0, and HDFS-200, to make append reliable. > > Since we have not yet made a 0.21 release, this numbering would be > consistent. It also naturally permits further 1.x releases that add > features, like security. > > Doug >
-
Re: [DISCUSSION] Release processTom White 2010-03-31, 18:44
[Owen] > I think that we should change the rules so that the remaining
0.X releases are minor releases. +1 [Owen] > I'll volunteer to be release manager for a release branched in November, which should be roughly 6 months after Tom's new 0.21 release. That would be great. Thanks, Owen! [Doug] > I'm proposing we make a 1.0 release that tries to match what folks are actually using in production and clarifies what APIs may be relied upon to be stable going forward. A pre-requisite to doing a 0.21 release is identifying the public API that we intend to support going forward. To help enable this I intend to backport the InterfaceAudience annotations to 0.20 (HADOOP-5073, and associated JIRAs like HADOOP-6658) so we can run JDiff between the public 0.20 API and 0.21. I'll also write some basic compatibility tests to check that we can, for example, run MR programs on both unchanged (MAPREDUCE-1637). Rather than doing a 1.0 release from 0.20, perhaps it would be sufficient to run these tools could be run against other Hadoop distributions to check compatibility. [Konstantin] > I don't understand what is wrong with 0.21 released from 0.21? [Konstantin] > - Making a new release from trunk will take long time to stabilize. The current 0.21 branch is 6 months behind trunk. Also, it's not clear that it is backwards compatible with 0.20. I'm volunteering to create a new 0.21 branch off trunk, then make a series of 0.21 releases, which would get progressively more stable. This work would hopefully act as a foundation for Owen's release in November. Here's what I propose doing: 1. Create a new 0.21 branch from trunk on Friday 16 April. 2. Identify and fix blockers. Compatibility is the major blocker. 3. Cut a release candidate, test, vote - repeat until a release candidate is agreed upon. Cheers, Tom On Wed, Mar 31, 2010 at 10:13 AM, Konstantin Shvachko <[EMAIL PROTECTED]> wrote: > HDFS 0.20 does not have a reliable append. > Also it is (was last time I looked) incompatible with the 0.21 append > HDFS-256. > That wouldn't be a problem if that was the only incompatibility. But it's > not. > > If 1.0 is re-labeled or re-branched from 0.20 we will have to many > incompatibilities > going into further releases so that we will have to call all of them major > ones > for the foreseeable future. > > I don't understand what is wrong with 0.21 released from 0.21? > > - Making a new release from trunk will take long time to stabilize. > - Branching out 0.20.x as 1.0 introduces too many incompatibilities. > > I would like to propose a straightforward release of 0.21 from current 0.21 > branch. > > --Konstantin > > > On 3/31/2010 9:04 AM, Doug Cutting wrote: >> >> Owen O'Malley wrote: >>> >>> It is tempting and I think that 0.20 is *really* our 1.0, but I think >>> re-labeling a release a year after it came out would be confusing. >> >> I wasn't proposing just a re-labeling. I was proposing a new release, >> branched from 0.20 rather than trunk. We'd introduce some changes, after >> voting on each of course. Candidates are MAPREDUCE-1623 and >> MAPREDUCE-1650, to better clarify what's intended to be supported in >> 1.0, and HDFS-200, to make append reliable. >> >> Since we have not yet made a 0.21 release, this numbering would be >> consistent. It also naturally permits further 1.x releases that add >> features, like security. >> >> Doug >> > >
-
Re: [DISCUSSION] Release processAmr Awadallah 2010-03-31, 20:34
If I may pitch in briefly here, believe it or not, there is a lot of
enterprises out there whom think that anything that isn't version 1.0 isn't worth considering, let alone deploying (doesn't make sense, but some people are like that). Hence, from a market adoption point of view, Apache Hadoop is currently hindered by that. So unless we think Hadoop isn't operationally ready yet (which I think we agree isn't the case), the sooner we adopt the 1.0 labeling the better, as we all want to see Hadoop adopted in more and more places. My 2 cents, -- amr On 3/31/2010 9:06 AM, Doug Cutting wrote: > Allen Wittenauer wrote: >> The fact that there are a *ton* >> of admin tool changes/fixes/additions in the Yahoo! Distribution of 0.20 >> (and quite a few in CDH) should be the big hint that Apache 0.20 is >> *not* >> 1.0. > > Right. I'm proposing we make a 1.0 release that tries to match what > folks are actually using in production and clarifies what APIs may be > relied upon to be stable going forward. > > Doug
-
Re: [DISCUSSION] Release processDoug Cutting 2010-03-31, 21:19
Konstantin Shvachko wrote:
> I would like to propose a straightforward release of 0.21 from current > 0.21 branch. That could be done too. Tom's volunteered to drive a release from trunk in a few weeks. Owen's volunteered to drive another release from trunk in about six months. Would you like to volunteer to drive a release from the current 0.21 branch? My latest proposal, a 1.0 branch based on 0.20, contains two questions: 1. Should we make an Apache release that more closely corresponds to what folks are using in production today (and will be using for a while yet)? 2. If we're considering the 0.20 mapreduce and filesystem APIs to be 1.0 APIs, and the new mapreduce and filesystem APIs to be 2.0 APIs, shouldn't our release numbering reflect that? Release numbers are fundamentally about compatibility declarations. We could instead change our compatibility rules to specifically mention certain release numbers, but that feels the wrong way around. Since we've not yet made a 0.21 release, we still have an opportunity to interject a 1.0 release with the semantics folks expect: its public APIs are stable. If there's support for this proposal, then I'd volunteer to drive it. I won't bother to pursue this unless folks think it is worthwhile, so, if you like it, please speak up. While a release itself cannot be vetoed and only requires a simple majority, we'll need to vote which patches would be applied to a 1.0 branch, and those code-change votes require consensus, so, vetos there would stop the process. So please also speak up if you strongly oppose this proposal. Doug
-
Re: [DISCUSSION] Release processKonstantin Shvachko 2010-04-01, 01:29
On 3/31/2010 2:19 PM, Doug Cutting wrote:
> Konstantin Shvachko wrote: >> I would like to propose a straightforward release of 0.21 from current >> 0.21 branch. > > That could be done too. Would you like to volunteer to drive a release from > the current 0.21 branch? I would If I could. I intended to volunteer to fix some blockers if 0.21 release from 0.21 branch happens. I think it is important to have the last "insecure" release. --Konstantin
-
Re: [DISCUSSION] Release processAndrew Purtell 2010-04-01, 16:33
Our org (Trend Micro) will be using an internal build based on 0.20 for at least the rest of this year. It is, really, already "1.0" from our point of view, the first ASF Hadoop release officially adopted into our production environment. I hope other users of Hadoop will speak up on this thread to provide valuable feedback. I do hear informally that we are far from alone in this, but I have no idea if we are in a majority or not.
We could have adopted 0.21 -- and would have preferred to for the HDFS improvements needed by HBase to provide data durability -- but due to the length of time 0.21 has remained in an unreleased state that window has closed for us. We had internal milestones to meet. Instead we have adopted a modified 0.20. There are others like us who are basing production HBase systems on 0.20 + HDFS-200, and a couple of other HDFS patches of lesser consequence which have also been backported thanks to the kind assistance of Cloudera, Facebook, the HDFS devs, and others. I hope this user's perspective has been useful. Best regards, - Andy > From: Doug Cutting [...] > My latest proposal, a 1.0 branch based on 0.20, contains > two questions: > > 1. Should we make an Apache release that more closely > corresponds to what folks are using in production today (and > will be using for a while yet)? > > 2. If we're considering the 0.20 mapreduce and filesystem > APIs to be 1.0 APIs, and the new mapreduce and filesystem > APIs to be 2.0 APIs, shouldn't our release numbering reflect > that? Release numbers are fundamentally about > compatibility declarations. We could instead change > our compatibility rules to specifically mention certain > release numbers, but that feels the wrong way around. > Since we've not yet made a 0.21 release, we still have an > opportunity to interject a 1.0 release with the semantics > folks expect: its public APIs are stable. > > If there's support for this proposal, then I'd volunteer to > drive it. I won't bother to pursue this unless folks > think it is worthwhile, so, if you like it, please speak > up. While a release itself cannot be vetoed and only > requires a simple majority, we'll need to vote which patches > would be applied to a 1.0 branch, and those code-change > votes require consensus, so, vetos there would stop the > process. So please also speak up if you strongly > oppose this proposal. > > Doug
-
Re: [DISCUSSION] Release processChris K Wensel 2010-04-01, 17:11
are we saying we will de-deprecate the stable APIs in .20, or make the new APIs introduced in .20 stable?
+1 on removing the deprecations on the stable APIs. On Mar 31, 2010, at 2:19 PM, Doug Cutting wrote: > Konstantin Shvachko wrote: >> I would like to propose a straightforward release of 0.21 from current 0.21 branch. > > That could be done too. Tom's volunteered to drive a release from trunk in a few weeks. Owen's volunteered to drive another release from trunk in about six months. Would you like to volunteer to drive a release from the current 0.21 branch? > > My latest proposal, a 1.0 branch based on 0.20, contains two questions: > > 1. Should we make an Apache release that more closely corresponds to what folks are using in production today (and will be using for a while yet)? > > 2. If we're considering the 0.20 mapreduce and filesystem APIs to be 1.0 APIs, and the new mapreduce and filesystem APIs to be 2.0 APIs, shouldn't our release numbering reflect that? Release numbers are fundamentally about compatibility declarations. We could instead change our compatibility rules to specifically mention certain release numbers, but that feels the wrong way around. Since we've not yet made a 0.21 release, we still have an opportunity to interject a 1.0 release with the semantics folks expect: its public APIs are stable. > > If there's support for this proposal, then I'd volunteer to drive it. I won't bother to pursue this unless folks think it is worthwhile, so, if you like it, please speak up. While a release itself cannot be vetoed and only requires a simple majority, we'll need to vote which patches would be applied to a 1.0 branch, and those code-change votes require consensus, so, vetos there would stop the process. So please also speak up if you strongly oppose this proposal. > > Doug -- Chris K Wensel [EMAIL PROTECTED] http://www.concurrentinc.com
-
Re: [DISCUSSION] Release processDoug Cutting 2010-04-01, 17:18
Chris K Wensel wrote:
> are we saying we will de-deprecate the stable APIs in .20, or make the new APIs introduced in .20 stable? > > +1 on removing the deprecations on the stable APIs. Yes. I too am +1 on removing deprecations in stable, public APIs in a 1.0 release. Code that uses only public 1.0 APIs and compiles without deprecation warnings against 1.0 is code we should support going forward. Doug
-
Re: [DISCUSSION] Release processJay Booth 2010-04-01, 17:36
Thanks Tom and Owen for stepping up --
We're using 0.20.2 as effectively 1.0 here, too, so I think a 1.0 branch is a good idea that recognizes that status quo and deal with it, particularly for having a 1.0 that's pre-split and pre-security (big changes). Couple random thoughts: 1) I agree with marking the old API stable in 1.0, especially considering hive/pig/etc all use it.. what do we do with the new API? Mark it "next-gen"? We have one random job that uses it and I bet that's the same for a lot of people. Leaving it there would be useful as far as not breaking existing code and allowing the frameworks to port over to the new API slowly and not have to maintain 2 branches.. but then that starts to obligate us to fully support the new API on the 1.0 branch. 2) Does this come with a date for 2.0? Will there be a 1.9 or maybe a 2.0-alpha? I think it might be wise to try and deploy an alpha-2.0 branch a few places before making a firm commitment to "this is THE 2.0 API and directory layout". 3) What about append? Based on the amount of work that went into it, I'm assuming it's a very difficult backport.. does this mean 1.0 will never support append? (if it can't, it can't). On Thu, Apr 1, 2010 at 1:11 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote: > are we saying we will de-deprecate the stable APIs in .20, or make the new > APIs introduced in .20 stable? > > +1 on removing the deprecations on the stable APIs. > > On Mar 31, 2010, at 2:19 PM, Doug Cutting wrote: > > > Konstantin Shvachko wrote: > >> I would like to propose a straightforward release of 0.21 from current > 0.21 branch. > > > > That could be done too. Tom's volunteered to drive a release from trunk > in a few weeks. Owen's volunteered to drive another release from trunk in > about six months. Would you like to volunteer to drive a release from the > current 0.21 branch? > > > > My latest proposal, a 1.0 branch based on 0.20, contains two questions: > > > > 1. Should we make an Apache release that more closely corresponds to what > folks are using in production today (and will be using for a while yet)? > > > > 2. If we're considering the 0.20 mapreduce and filesystem APIs to be 1.0 > APIs, and the new mapreduce and filesystem APIs to be 2.0 APIs, shouldn't > our release numbering reflect that? Release numbers are fundamentally about > compatibility declarations. We could instead change our compatibility rules > to specifically mention certain release numbers, but that feels the wrong > way around. Since we've not yet made a 0.21 release, we still have an > opportunity to interject a 1.0 release with the semantics folks expect: its > public APIs are stable. > > > > If there's support for this proposal, then I'd volunteer to drive it. I > won't bother to pursue this unless folks think it is worthwhile, so, if you > like it, please speak up. While a release itself cannot be vetoed and only > requires a simple majority, we'll need to vote which patches would be > applied to a 1.0 branch, and those code-change votes require consensus, so, > vetos there would stop the process. So please also speak up if you strongly > oppose this proposal. > > > > Doug > > -- > Chris K Wensel > [EMAIL PROTECTED] > http://www.concurrentinc.com > >
-
Re: [DISCUSSION] Release processChris Douglas 2010-04-01, 17:38
> My latest proposal, a 1.0 branch based on 0.20, contains two questions:
> > 1. Should we make an Apache release that more closely corresponds to what > folks are using in production today (and will be using for a while yet)? > > 2. If we're considering the 0.20 mapreduce and filesystem APIs to be 1.0 > APIs, and the new mapreduce and filesystem APIs to be 2.0 APIs, shouldn't > our release numbering reflect that? Release numbers are fundamentally about > compatibility declarations. We could instead change our compatibility rules > to specifically mention certain release numbers, but that feels the wrong > way around. Since we've not yet made a 0.21 release, we still have an > opportunity to interject a 1.0 release with the semantics folks expect: its > public APIs are stable. > > If there's support for this proposal, then I'd volunteer to drive it. I > won't bother to pursue this unless folks think it is worthwhile, so, if you > like it, please speak up. While a release itself cannot be vetoed and only > requires a simple majority, we'll need to vote which patches would be > applied to a 1.0 branch, and those code-change votes require consensus, so, > vetos there would stop the process. So please also speak up if you strongly > oppose this proposal. Tom has volunteered to drive a 0.21 release based on trunk. Owen has volunteered to drive the release following that, which will follow Tom's by about six months. Doug, you're volunteering to drive a concurrent 1.0 release based on 0.20? What Owen and Tom have proposed- to ensure API compatibility in releases up to the 1.0 release- has the advantage of stabilizing the mapreduce and FileContext APIs in versions that can actually be deployed. It will also force the dev community to address the issues introduced by the project split, rather than continuing to focus on 0.20 by another name. Among these issues are some pretty basic tasks: implementing a coherent packaging/deployment story, sorting out testing and patch validation across the projects, and stabilizing trunk. Spending the next few months voting and arguing on which patches make it into "new" 0.20 (branched in 2008) instead of addressing these issues is *not* progress. I strongly oppose this. I agree with Konstantin about the viability of 0.21. While the aforementioned, basic issues need to be addressed for both trunk and 0.21- and perhaps duplicating this work on an ancient branch is not well spent- it is already being used. And trunk is not stable: rebasing from trunk now will be hell for committers and contributors. But these decisions need to be made by the release manager. BTW- a vote to adopt the RM role outlined by the httpd project (the long-lost origin of this thread) has not started. If not an httpd-style RM, then what have Tom, Owen, and Doug volunteered to do? While we're at it, perhaps the vote on adopting bylaws needs to be restarted. -C
-
Re: [DISCUSSION] Release processDoug Cutting 2010-04-01, 17:50
Chris Douglas wrote:
> Spending the next few months voting and arguing on which > patches make it into "new" 0.20 (branched in 2008) instead of > addressing these issues is *not* progress. I strongly oppose this. If it takes months, it is a failure. It should take weeks, if that. Thus far the changes suggested for a 1.0 branch are: - de-deprecate "classic" mapred APIs (no Jira issue yet) - add HDFS-200 (improved append) - add HADOOP-6668 & MAPREDUCE-1623 (audience and stability annotations) - add MAPREDUCE-1650 (exclude private elements from javadoc) Are there other specific issues folks would like to see in this? We could, e.g., set a 1-week deadline for proposals, 1 week for discussion, and one week for voting, and roll a candidate in three weeks. Would you strongly oppose such a 3-week process? Doug
-
Re: [DISCUSSION] Release processMattmann, Chris A 2010-04-01, 18:24
Hi Guys,
To throw in my 2 cents: it would be really nice to get out a 1.0 branch based off of 0.20 < it¹s not perfect, but releases never are. That¹s why you can make more of them. :) In terms of the significance of the 1.0 labeling, I think it's important for adoption. I was telling someone at JPL about Hadoop in 2006/07 when it sprung out of Nutch, and there was concern since it was still in the 0.x.y stage. Last year, revisiting Hadoop for some cloud experiments here also yielded the same 0.x.y labeling, despite the tremendous growth in stability, features, and fixes. This seemed to confuse some of the users and project members here. Cheers, Chris On 4/1/10 10:50 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Chris Douglas wrote: >> Spending the next few months voting and arguing on which >> patches make it into "new" 0.20 (branched in 2008) instead of >> addressing these issues is *not* progress. I strongly oppose this. > > If it takes months, it is a failure. It should take weeks, if that. > > Thus far the changes suggested for a 1.0 branch are: > - de-deprecate "classic" mapred APIs (no Jira issue yet) > - add HDFS-200 (improved append) > - add HADOOP-6668 & MAPREDUCE-1623 (audience and stability annotations) > - add MAPREDUCE-1650 (exclude private elements from javadoc) > > Are there other specific issues folks would like to see in this? We > could, e.g., set a 1-week deadline for proposals, 1 week for discussion, > and one week for voting, and roll a candidate in three weeks. > > Would you strongly oppose such a 3-week process? > > Doug > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSSION] Release processTodd Lipcon 2010-04-01, 18:26
On Thu, Apr 1, 2010 at 10:50 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Chris Douglas wrote: > >> Spending the next few months voting and arguing on which >> patches make it into "new" 0.20 (branched in 2008) instead of >> addressing these issues is *not* progress. I strongly oppose this. >> > > If it takes months, it is a failure. It should take weeks, if that. > > Thus far the changes suggested for a 1.0 branch are: > - de-deprecate "classic" mapred APIs (no Jira issue yet) > - add HDFS-200 (improved append) > With HDFS-200 we'd also need HDFS-142, and potentially other fixes yet to be determined (this append work based on 200 is still ongoing). I don't think it will be "stable release" quality within a few weeks. > - add HADOOP-6668 & MAPREDUCE-1623 (audience and stability annotations) > - add MAPREDUCE-1650 (exclude private elements from javadoc) > > Are there other specific issues folks would like to see in this? We could, > e.g., set a 1-week deadline for proposals, 1 week for discussion, and one > week for voting, and roll a candidate in three weeks. > > Would you strongly oppose such a 3-week process? > > Doug > -- Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSSION] Release processDoug Cutting 2010-04-01, 18:44
Todd Lipcon wrote:
> With HDFS-200 we'd also need HDFS-142 Good to know. I' have to admit to being puzzled by HDFS-200, since Nicholas resolved it as a duplicate on 7 January, yet Dhruba's continued to post patches to it. Dhruba, Stack: do you have any thoughts on the appropriateness of making a release with HDFS-200 & HDFS-142? > and potentially other fixes yet to be > determined (this append work based on 200 is still ongoing). > I don't think > it will be "stable release" quality within a few weeks. I assume that HDFS-200 as-is does more good than harm, no? Also, 1.0.0 doesn't need to be flawless. If we identify critical bugs after its release, then we'll make a 1.0.1 release. We might even call the first 1.0 release something like 1.0.0 alpha. That said, I do believe it will still be stable sooner than the release from trunk. Doug
-
Re: [DISCUSSION] Release processChris Douglas 2010-04-01, 20:59
> Thus far the changes suggested for a 1.0 branch are:
> - de-deprecate "classic" mapred APIs (no Jira issue yet) Why? Tom and Owen's proposal preserves compatibility with the deprecated FileSystem and mapred APIs up to 1.0. After Tom cuts a release- from either the 0.21 branch or trunk- then issues related to missing mapred.lib classes, partial implementations, etc. are ameliorated and they actually become usable. Telling users to ignore them and use the classic APIs only deepens our debt. I don't mind releasing 1.0 with the classic APIs. Given the installed base, it's probably required. But let's not kill the new APIs by calling them "experimental," thereby granting the old ones "official" status at the moment the new ones become viable. > - add HDFS-200 (improved append) > - add HADOOP-6668 & MAPREDUCE-1623 (audience and stability annotations) > - add MAPREDUCE-1650 (exclude private elements from javadoc) OK. From some previous messages, I thought you were proposing some mix of 0.20 + security + HDFS-200 + et al., to better reflect what many run in production, possibly spreading that backporting work over several 1.x releases. This comparably meager set- with a vote on HDFS-200- could easily be 0.20.3, plus a set of bug fixes Todd and I have been assembling. > Would you strongly oppose such a 3-week process? Having spent 2009 in the shadow of 0.20, I oppose any decision that prevents Apache from releasing the last year of work, or backporting existing work *again* onto that branch. With 0.21 finally coming out, a line of 1.x releases based on 0.20 would kneecap Owen and Tom's effort to restart the project. -C
-
Re: [DISCUSSION] Release processAllen Wittenauer 2010-04-01, 21:31
On 4/1/10 2:15 PM, "Mattmann, Chris A (388J)" <[EMAIL PROTECTED]> wrote: > In terms of the significance of the 1.0 labeling, I think it's important for > adoption. Companies wanting a 1.0 product could always pay Cloudera and get a v2 product. ;)
-
Re: [DISCUSSION] Release processMattmann, Chris A 2010-04-01, 21:36
LOL, I want a v100! :)
On 4/1/10 2:31 PM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote: On 4/1/10 2:15 PM, "Mattmann, Chris A (388J)" <[EMAIL PROTECTED]> wrote: > In terms of the significance of the 1.0 labeling, I think it's important for > adoption. Companies wanting a 1.0 product could always pay Cloudera and get a v2 product. ;) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSSION] Release processDoug Cutting 2010-04-01, 21:38
Chris Douglas wrote:
>> - de-deprecate "classic" mapred APIs (no Jira issue yet) > > Why? So that folks can be told that if their code compiles without deprecation warnings against 1.0 then it should work for all 1.x releases. > I don't mind releasing 1.0 with the classic APIs. Given the installed > base, it's probably required. But let's not kill the new APIs by > calling them "experimental," thereby granting the old ones "official" > status at the moment the new ones become viable. I was thinking that the new APIs should be 'public evolving' in 1.0. The classic APIs would be 'public stable'. Unless we don't want to reserve the right to still evolve the new APIs between now and 2.0. > OK. From some previous messages, I thought you were proposing some mix > of 0.20 + security + HDFS-200 + et al., to better reflect what many > run in production, possibly spreading that backporting work over > several 1.x releases. I did suggest that it would be good to subsequently release a version of Y!'s 0.20-based security patches as a 1.1 release. That's where Y! will first qualify security, and it seems a shame not to release that version. But perhaps this will prove impractical for some reason. > This comparably meager set- with a vote on > HDFS-200- could easily be 0.20.3, plus a set of bug fixes Todd and I > have been assembling. It could indeed instead be named 0.20.3, but if we agree that this (clarified with Tom's annotations) establishes the 1.0 API, then it would be good to number it as such, no? >> Would you strongly oppose such a 3-week process? > > Having spent 2009 in the shadow of 0.20, I oppose any decision that > prevents Apache from releasing the last year of work, or backporting > existing work *again* onto that branch. I don't see that this would prevent or discourage any other release. Nor does it require you to backport anything. Any backporting would be voluntary. Tom's privately told me he doesn't expect it to be difficult to backport HADOOP-6668 & MAPREDUCE-1623 (stability annotations) or MAPREDUCE-1650 (exclude private from javadoc), and I'm willing to backport those if he doesn't. > With 0.21 finally coming out, > a line of 1.x releases based on 0.20 would kneecap Owen and Tom's > effort to restart the project. How so? It seems you do oppose this proposal. Would you veto code changes required to make such a release with a technical rationale? Would you vote -1 in the (majority-based) release vote? Doug
-
Re: [DISCUSSION] Release processChris Douglas 2010-04-02, 02:23
>>> - de-deprecate "classic" mapred APIs (no Jira issue yet)
>> >> Why? > > So that folks can be told that if their code compiles without deprecation > warnings against 1.0 then it should work for all 1.x releases. Deprecation warnings aren't only fair notice that the API may go away. The classic FileSystem and mapred APIs may never be purged if a compelling backwards compatibility story is developed. But without that solution, those applications may, someday, break. Until then, the deprecation warnings serve not only to steer developers away from code removed in that hypothetical situation, but also identify those sections as not actively developed. I'm pretty sure Thread::destroy still works and it was deprecated in what, 1.1? Deprecation is a signal that the development effort *may* proceed at the expense of these APIs, whether in performance, versatility, or- the most extreme case- removal. Nobody will harm users of these APIs without justifying why a solution avoiding it is worse. >> I don't mind releasing 1.0 with the classic APIs. Given the installed >> base, it's probably required. But let's not kill the new APIs by >> calling them "experimental," thereby granting the old ones "official" >> status at the moment the new ones become viable. > > I was thinking that the new APIs should be 'public evolving' in 1.0. The > classic APIs would be 'public stable'. Unless we don't want to reserve the > right to still evolve the new APIs between now and 2.0. The new APIs are unusable in the 0.20-based 1.0. They'd be added- at high expense- in 1.2 at the earliest in the structure you've proposed, since FileContext and mapreduce.lib are only in the 0.21 branch. Realistically, 2.0- which is what Tom is releasing in your model, right?- is the first time anyone will consider the new APIs. By that time, we'll have a larger installed base on the classic APIs, attracted by the 1.0 label. And the proposal is to cut this 1.0 release concurrently with a 2.0 alpha? A 0.20-based 1.0 will undermine the new release, again, just as its payload becomes viable. > I did suggest that it would be good to subsequently release a version of > Y!'s 0.20-based security patches as a 1.1 release. That's where Y! will > first qualify security, and it seems a shame not to release that version. > But perhaps this will prove impractical for some reason. Re-release it in Apache? Why spend the effort to repackage an older version with fewer features and inferior performance when most of the work is already in trunk? > It could indeed instead be named 0.20.3, but if we agree that this > (clarified with Tom's annotations) establishes the 1.0 API, then it would be > good to number it as such, no? I continue to disagree. That the methods are not removed in 1.0 does not establish them as "the 1.0 API". Nobody has advocated for their removal- because it would be ruinous to users- but that stance doesn't require a commitment to those APIs as the only stable ones, particularly over the APIs designed for backwards compatibility. > I don't see that this would prevent or discourage any other release. Nor > does it require you to backport anything. Any backporting would be > voluntary. Tom's privately told me he doesn't expect it to be difficult to > backport HADOOP-6668 & MAPREDUCE-1623 (stability annotations) or > MAPREDUCE-1650 (exclude private from javadoc), and I'm willing to backport > those if he doesn't. It would require committers and contributors to backport bugs fixed in 2.0 to 1.x. This would not be a voluntary burden borne only by the willing. Calling 0.20 the basis of 1.0 imposes an even longer life for that branch that must be endured by everyone working on the project. And the delta between these releases is not trivial. > It seems you do oppose this proposal. Would you veto code changes required > to make such a release with a technical rationale? Would you vote -1 in the > (majority-based) release vote? I've said plainly that I oppose it. I don't know what you mean by vetoing the required code changes. Are you suggesting that I would sabotage this work by blocking issues from being committed to the release branch? And yes: right now, I would vote -1 on the release. Speaking of the release vote process, I renew my request that we formalize both the RM role and the bylaws. -C
-
Re: [DISCUSSION] Release processAmr Awadallah 2010-04-02, 04:05
> Companies wanting a 1.0 product could always pay Cloudera and get a
v2 product. lol :) good point Allen, lets please *not* adopt a 1.0 labeling for Apache Hadoop :) Seriously though, to avoid my previous comment about 1.0 labeling being misinterpreted, though I think the 1.0 labeling is important, I think it is much more *urgent* and *important* to get the release cycle back in order, which should focus on getting that done first. -- amr On 4/1/2010 2:31 PM, Allen Wittenauer wrote: > > > On 4/1/10 2:15 PM, "Mattmann, Chris A (388J)" > <[EMAIL PROTECTED]> wrote: > >> In terms of the significance of the 1.0 labeling, I think it's important for >> adoption. >> > Companies wanting a 1.0 product could always pay Cloudera and get a v2 > product. ;) > >
-
Re: [DISCUSSION] Release processDhruba Borthakur 2010-04-02, 04:31
We have been testing the HDFS append code for 0.20 (using HDFS-200,
HDFS-142), but I believe it is not ready for production yet. I am guessing that there would be another two months of testing before I would classify 0.20.3 + HDFS-200 as production quality. HDFS-200 touches code paths that would get triggered even if the append-sync feature is not used, hence I would be hesitant to put it in any production release before additional testing is done (it would get ready in one/two months timeframe from now) thanks, dhruba On Thu, Apr 1, 2010 at 9:05 PM, Amr Awadallah <[EMAIL PROTECTED]> wrote: > > Companies wanting a 1.0 product could always pay Cloudera and get a v2 > product. > > lol :) good point Allen, lets please *not* adopt a 1.0 labeling for Apache > Hadoop :) > > Seriously though, to avoid my previous comment about 1.0 labeling being > misinterpreted, though I think the 1.0 labeling is important, I think it is > much more *urgent* and *important* to get the release cycle back in order, > which should focus on getting that done first. > > -- amr > > > On 4/1/2010 2:31 PM, Allen Wittenauer wrote: > >> >> >> On 4/1/10 2:15 PM, "Mattmann, Chris A (388J)" >> <[EMAIL PROTECTED]> wrote: >> >> >>> In terms of the significance of the 1.0 labeling, I think it's important >>> for >>> adoption. >>> >>> >> Companies wanting a 1.0 product could always pay Cloudera and get a v2 >> product. ;) >> >> >> > -- Connect to me at http://www.facebook.com/dhruba
-
Re: [DISCUSSION] Release processOwen O'Malley 2010-04-02, 05:33
On Apr 1, 2010, at 10:50 AM, Doug Cutting wrote:
> If it takes months, it is a failure. It should take weeks, if that. On Apr 1, 2010, at 9:31 PM, Dhruba Borthakur wrote: > We have been testing the HDFS append code for 0.20 (using HDFS-200, > HDFS-142), but I believe it is not ready for production yet. I am > guessing > that there would be another two months of testing before I would > classify > 0.20.3 + HDFS-200 as production quality. Even before I saw Dhruba's message, it seemed that Doug was vastly underestimating the time to get a usable release out the door. Heaven knows that last time we tried to get 0.21 ready that I badly underestimated the amount of work to the point where it missed the window where Yahoo could have deployed it. I strongly applaud Tom's work at trying to get a releasable 0.21 based on the current trunk. I absolutely don't want a 0.20-based release and/or branch undercutting efforts to fix trunk. In my experience with releasing Hadoop, the bare minimum of scale testing is a couple of weeks on 500 nodes (and more is far better) with a team of people testing it. I think that releasing a 1.0 that has never been tested at scale would be disastrous. If Tom gets a rebased 0.21 out the door in the summer, that would put trunk into good shape to be the foundation of a new release (0.22 aka 1.0) that is cut at the end of the year. -- Owen
-
Re: [DISCUSSION] Release processDaniel Templeton 2010-04-02, 13:52
From the Java SE 7 JavaDocs:
> A program element annotated @Deprecated is one that programmers are discouraged from using, typically because it is dangerous, or because a better alternative exists. Compilers warn when a deprecated program element is used or overridden in non-deprecated code. and from the javadoc page: > A deprecated API is one that you are no longer recommended to use, due to changes in the API. While deprecated classes, methods, and fields are still implemented, they may be removed in future implementations, so you should not use them in new code, and if possible rewrite old code not to use them. So, yes, deprecation is just a warning to avoid these APIs, but deprecation is a stronger statement than you're portraying. It's not fair notice that the API may go away. It's final notice that the API should go away but for backward compatibility reasons it can't. Decprecated := don't use. You shouldn't deprecate an API unless there is an alternative or unless its use is actually dangerous. Daniel On 04/01/10 19:23, Chris Douglas wrote: >>>> - de-deprecate "classic" mapred APIs (no Jira issue yet) >>> >>> Why? >> >> So that folks can be told that if their code compiles without deprecation >> warnings against 1.0 then it should work for all 1.x releases. > > Deprecation warnings aren't only fair notice that the API may go away. > The classic FileSystem and mapred APIs may never be purged if a > compelling backwards compatibility story is developed. But without > that solution, those applications may, someday, break. Until then, the > deprecation warnings serve not only to steer developers away from code > removed in that hypothetical situation, but also identify those > sections as not actively developed. I'm pretty sure Thread::destroy > still works and it was deprecated in what, 1.1? Deprecation is a > signal that the development effort *may* proceed at the expense of > these APIs, whether in performance, versatility, or- the most extreme > case- removal. Nobody will harm users of these APIs without justifying > why a solution avoiding it is worse. > >>> I don't mind releasing 1.0 with the classic APIs. Given the installed >>> base, it's probably required. But let's not kill the new APIs by >>> calling them "experimental," thereby granting the old ones "official" >>> status at the moment the new ones become viable. >> >> I was thinking that the new APIs should be 'public evolving' in 1.0. The >> classic APIs would be 'public stable'. Unless we don't want to reserve the >> right to still evolve the new APIs between now and 2.0. > > The new APIs are unusable in the 0.20-based 1.0. They'd be added- at > high expense- in 1.2 at the earliest in the structure you've proposed, > since FileContext and mapreduce.lib are only in the 0.21 branch. > Realistically, 2.0- which is what Tom is releasing in your model, > right?- is the first time anyone will consider the new APIs. By that > time, we'll have a larger installed base on the classic APIs, > attracted by the 1.0 label. And the proposal is to cut this 1.0 > release concurrently with a 2.0 alpha? A 0.20-based 1.0 will undermine > the new release, again, just as its payload becomes viable. > >> I did suggest that it would be good to subsequently release a version of >> Y!'s 0.20-based security patches as a 1.1 release. That's where Y! will >> first qualify security, and it seems a shame not to release that version. >> But perhaps this will prove impractical for some reason. > > Re-release it in Apache? Why spend the effort to repackage an older > version with fewer features and inferior performance when most of the > work is already in trunk? > >> It could indeed instead be named 0.20.3, but if we agree that this >> (clarified with Tom's annotations) establishes the 1.0 API, then it would be >> good to number it as such, no? > > I continue to disagree. That the methods are not removed in 1.0 does > not establish them as "the 1.0 API". Nobody has advocated for their
-
Re: [DISCUSSION] Release processDoug Cutting 2010-04-02, 16:09
Owen O'Malley wrote:
> In my experience with releasing Hadoop, the bare minimum of scale > testing is a couple of weeks on 500 nodes (and more is far better) with > a team of people testing it. I think that releasing a 1.0 that has never > been tested at scale would be disastrous. For the record, I never proposed getting a 1.0 release out that had been scale tested in a few weeks. Rather, I proposed getting an alpha 1.0 release out in a few weeks. Bugfix releases from that, after testing, could then be made. My claim was that we might sooner get a stable Apache release that supports append via this route than from trunk. But, since there is opposition to this proposal, I will abandon it. Doug
-
Re: [DISCUSSION] Release processDoug Cutting 2010-04-02, 17:08
Chris Douglas wrote:
> Speaking of the release vote process, I renew my request that we > formalize both the RM role and the bylaws. -C I think the HTTPD release rules are non-controversial and would support adoption of something similar. Someone needs to draft a proposal, initiate a discussion, refine the draft, and finally vote to enact it. It's kind of like a release, in that its best if someone manages the process. Would you like to do this? Similarly, for bylaws, we need someone to lead the process. The previous attempt started with a vote, rather than a discussion. That vote turned into a discussion that never turned back into a vote. Doug
-
Re: [DISCUSSION] Release processChris Douglas 2010-04-05, 19:04
> So, yes, deprecation is just a warning to avoid these APIs, but deprecation
> is a stronger statement than you're portraying. It's not fair notice that > the API may go away. It's final notice that the API should go away but for > backward compatibility reasons it can't. Decprecated := don't use. You > shouldn't deprecate an API unless there is an alternative or unless its use > is actually dangerous. With what part of the definition are you disagreeing? This seems compatible with what I wrote, if not a restatement of it. The APIs at issue have preferred alternatives and will be retained for backwards compatibility reasons. They seem to fit these criteria exactly. Hadoop has used deprecation to signal intent to prune APIs and my point was that we don't need to actually remove them, but we do need to warn users that, for example, performance or compatibility with future features may be limited if they insist on the "classic" APIs. -C > Daniel > > On 04/01/10 19:23, Chris Douglas wrote: >>>>> >>>>> - de-deprecate "classic" mapred APIs (no Jira issue yet) >>>> >>>> Why? >>> >>> So that folks can be told that if their code compiles without deprecation >>> warnings against 1.0 then it should work for all 1.x releases. >> >> Deprecation warnings aren't only fair notice that the API may go away. >> The classic FileSystem and mapred APIs may never be purged if a >> compelling backwards compatibility story is developed. But without >> that solution, those applications may, someday, break. Until then, the >> deprecation warnings serve not only to steer developers away from code >> removed in that hypothetical situation, but also identify those >> sections as not actively developed. I'm pretty sure Thread::destroy >> still works and it was deprecated in what, 1.1? Deprecation is a >> signal that the development effort *may* proceed at the expense of >> these APIs, whether in performance, versatility, or- the most extreme >> case- removal. Nobody will harm users of these APIs without justifying >> why a solution avoiding it is worse. >> >>>> I don't mind releasing 1.0 with the classic APIs. Given the installed >>>> base, it's probably required. But let's not kill the new APIs by >>>> calling them "experimental," thereby granting the old ones "official" >>>> status at the moment the new ones become viable. >>> >>> I was thinking that the new APIs should be 'public evolving' in 1.0. The >>> classic APIs would be 'public stable'. Unless we don't want to reserve >>> the >>> right to still evolve the new APIs between now and 2.0. >> >> The new APIs are unusable in the 0.20-based 1.0. They'd be added- at >> high expense- in 1.2 at the earliest in the structure you've proposed, >> since FileContext and mapreduce.lib are only in the 0.21 branch. >> Realistically, 2.0- which is what Tom is releasing in your model, >> right?- is the first time anyone will consider the new APIs. By that >> time, we'll have a larger installed base on the classic APIs, >> attracted by the 1.0 label. And the proposal is to cut this 1.0 >> release concurrently with a 2.0 alpha? A 0.20-based 1.0 will undermine >> the new release, again, just as its payload becomes viable. >> >>> I did suggest that it would be good to subsequently release a version of >>> Y!'s 0.20-based security patches as a 1.1 release. That's where Y! will >>> first qualify security, and it seems a shame not to release that version. >>> But perhaps this will prove impractical for some reason. >> >> Re-release it in Apache? Why spend the effort to repackage an older >> version with fewer features and inferior performance when most of the >> work is already in trunk? >> >>> It could indeed instead be named 0.20.3, but if we agree that this >>> (clarified with Tom's annotations) establishes the 1.0 API, then it would >>> be >>> good to number it as such, no? >> >> I continue to disagree. That the methods are not removed in 1.0 does >> not establish them as "the 1.0 API". Nobody has advocated for their
-
Re: [DISCUSSION] Release processChris K Wensel 2010-04-05, 21:16
> The APIs at
> issue have preferred alternatives and will be retained for backwards > compatibility reasons. Actually, from my perspective, re the 0.20 branch, they are not preferred alternatives and are not complete as more were introduced into .21 (of which many are wrappers around the stable apis for sake of transition). Which further complicates matters, it's an all or nothing switch, you can't use some new and some old in the same app (see the Configuration/JobConf property that flags the new apis in use). chris -- Chris K Wensel [EMAIL PROTECTED] http://www.concurrentinc.com
-
Re: [DISCUSSION] Release processChris Douglas 2010-04-05, 21:54
> Actually, from my perspective, re the 0.20 branch, they are not preferred alternatives and are not complete as more were introduced into .21 (of which many are wrappers around the stable apis for sake of transition).
Sorry, I must have been unclear, because this is part of the argument. FileContext is only in 0.21 and- as was acknowledged more than once- the mapreduce API is not useful in 0.20. However, the APIs in 0.21/trunk are both dev-preferred and usable. If Tom's 0.21 release is concurrent with the 0.20-based 1.0, efforts to move users to the new FileContext and mapreduce APIs will be undermined by the latter release. Summarily: given that the APIs are *not* fully functional, preferred alternatives in 0.20- we shouldn't base our 1.0 release on it. Do you agree? -C > Which further complicates matters, it's an all or nothing switch, you can't use some new and some old in the same app (see the Configuration/JobConf property that flags the new apis in use). > > chris > > -- > Chris K Wensel > [EMAIL PROTECTED] > http://www.concurrentinc.com > >
-
Re: [DISCUSSION] Release processChris K Wensel 2010-04-06, 00:06
> Summarily: given that the APIs are *not* fully functional, preferred
> alternatives in 0.20- we shouldn't base our 1.0 release on it. Do you > agree? -C well said, but I still think a release is fine off .20 if we remove the deprecation warnings (and drop the new apis completely as they add confusion) as the stable apis work great and we need a well healed 1.0 sooner than later. but I could be missing the finer points of this argument, so bare with me (or ignore me). someone once said if you aren't embarrassed by your 1.0, you waited too long. ckw -- Chris K Wensel [EMAIL PROTECTED] http://www.concurrentinc.com
-
Re: [DISCUSSION] Release processAllen Wittenauer 2010-04-06, 01:19
On Apr 5, 2010, at 5:06 PM, Chris K Wensel wrote: > > we need a well healed 1.0 sooner than later. Why?
-
Re: [DISCUSSION] Release processSteve Loughran 2010-04-06, 12:55
Chris Douglas wrote:
>> Thus far the changes suggested for a 1.0 branch are: >> - de-deprecate "classic" mapred APIs (no Jira issue yet) > > Why? Tom and Owen's proposal preserves compatibility with the > deprecated FileSystem and mapred APIs up to 1.0. After Tom cuts a > release- from either the 0.21 branch or trunk- then issues related to > missing mapred.lib classes, partial implementations, etc. are > ameliorated and they actually become usable. Telling users to ignore > them and use the classic APIs only deepens our debt. > > I don't mind releasing 1.0 with the classic APIs. Given the installed > base, it's probably required. But let's not kill the new APIs by > calling them "experimental," thereby granting the old ones "official" > status at the moment the new ones become viable. I worry that declaring 0.20 the 1.0 release would lock things down too much. The new FS apis, the new MR apis should be the single set going forwards, and if you end up having to support the old and the new, the the only ones to target will be the old ones. > Having spent 2009 in the shadow of 0.20, I oppose any decision that > prevents Apache from releasing the last year of work, or backporting > existing work *again* onto that branch. With 0.21 finally coming out, > a line of 1.x releases based on 0.20 would kneecap Owen and Tom's > effort to restart the project. -C
-
Re: [DISCUSSION] Release processSteve Loughran 2010-04-06, 13:02
Allen Wittenauer wrote:
> On Apr 5, 2010, at 5:06 PM, Chris K Wensel wrote: >> we need a well healed 1.0 sooner than later. > > Why? > > I think it would be good for a 0.21 with the newly renamed artifacts hadoop-common, hadoop-hdfs and hadoop-mapred out there; I think the new APIs should be made available in a publicly usable state. I'd rather that than suddenly say "oh, too many people are using the unstable stuff we should give it a new number". By keeping it at 0.20, it makes clear its apis are unstable, -steve
-
Re: [DISCUSSION] Release processAllen Wittenauer 2010-04-06, 16:00
On Apr 6, 2010, at 6:02 AM, Steve Loughran wrote: > Allen Wittenauer wrote: >> On Apr 5, 2010, at 5:06 PM, Chris K Wensel wrote: >>> we need a well healed 1.0 sooner than later. >> Why? > > I think it would be good for a 0.21 with the newly renamed artifacts hadoop-common, hadoop-hdfs and hadoop-mapred out there; I think the new APIs should be made available in a publicly usable state. I'd rather that than suddenly say "oh, too many people are using the unstable stuff we should give it a new number". By keeping it at 0.20, it makes clear its apis are unstable, My main point was that suddenly people seem to be hot to declare something 1.0. I'm trying to understand why, suddenly, various parts of the community seem to think 1.0 needs to happen. [The usual answer appears to be "adoption" but I think that's a bull-something reason masquerading as "commercial viability"... which to me should not be a primary concern around an *open source* software package. I can't help but wonder if there are now a bunch of companies that feel safe deploying OpenSSL based software since they finally declared 1.0. (Altho it would be nice to have a Hadoop 1.0 before 10 years elapse. *smile* But if that is as long as it takes, that's as long as it takes, IMO.) ]
-
Re: [DISCUSSION] Release processDoug Cutting 2010-04-06, 19:05
Allen Wittenauer wrote:
> My main point was that suddenly people seem to be hot to declare something 1.0. I'm trying to understand why [...] My rationale for suggesting a release named 1.0 was that I prefer that release numbers say something about compatibility. The compatibility rules we've used for Hadoop (which are not too different that what most would assume about versions) are that pre-1.0 releases may break compatibility with one another, while post-1.0 we'd only try to move folks to new, primary APIs at major releases. Programs written against 1.0 would run against any 1.x release, but may require modifications before they'd run against any 2.x or 3.x release. So a 1.0 release implies that we have APIs that we intend to support for considerably longer than a 0.x release. It's now been proposed, post-fact, that the "classic" APIs in 0.20 will be supported long-term. So a 1.0 release with these APIs undeprecated, would rationalize our version numbers, as we further refine their eventual replacements, what would become the 2.0 APIs. We've long-delayed declaring 1.0 because we were afraid to commit to supporting a given API for a longer term. Now folks are willing to make that long-term commitment to an API, yet seem reluctant to call it 1.0. I suppose there are lots of other things that folks could think that a 1.0 release implies. I've always argued that release numbers should be about compatibility and compatibility only. Doug
-
Re: [DISCUSSION] Release processChris Douglas 2010-04-06, 21:08
> We've long-delayed declaring 1.0 because we were afraid to commit to
> supporting a given API for a longer term. Now folks are willing to make > that long-term commitment to an API, yet seem reluctant to call it 1.0. The commitment is to the new APIs. "Folks" are reluctant to cut a release without them and call it 1.0. Continuing to support the applications written in the old APIs is a pragmatic decision. The absolute purity of the API is secondary to supporting existing users, who don't want to rewrite their applications as a prerequisite to upgrading their clusters. New applications should be written in the new API, so the "classic" one is deprecated. This position is neither confusing nor contradictory. The old APIs won't be deleted, but the commitment to them is contingent and an accommodation for existing users. -C |