|
Eli Collins
2010-05-21, 20:42
Konstantin Shvachko
2010-06-29, 00:50
Eli Collins
2010-06-29, 03:03
Bernd Fondermann
2010-06-29, 15:29
Jay Booth
2010-06-29, 17:11
Bernd Fondermann
2010-06-29, 18:04
Eli Collins
2010-06-29, 18:02
Bernd Fondermann
2010-06-29, 18:10
Amr Awadallah
2010-05-22, 02:26
Nigel Daley
2010-05-22, 20:08
Jeff Hammerbacher
2010-05-25, 08:09
Steve Loughran
2010-05-25, 16:28
Eli Collins
2010-05-25, 19:42
Steve Loughran
2010-05-26, 10:24
Eli Collins
2010-05-26, 16:13
Jeff Hammerbacher
2010-05-31, 17:16
Eli Collins
2010-06-01, 21:28
Jeff Hammerbacher
2010-06-03, 05:45
|
-
[DISCUSSION] Proposal for making core Hadoop changesEli Collins 2010-05-21, 20:42
As HDFS and MapReduce have matured the cost and complexity of
introducing features has grown. Each new feature has to consider interactions with a growing set of existing features, a growing user base (upgrades, backwards compatibility) and additional use cases (more and more projects now build on them). At the same time we don't want the high bar for contribution to unnecessarily hinder new development and releases. Many projects at a similar stage address this by adopting a more formal way to describe, socialize and shepherd enhancements to their platforms. Today, new features are often discussed via an umbrella jira, which may have an attached design document. There are a number of issues with this approach. The design documents vary in format and quality, and are often reviewed by a limited audience. They aren't version controlled. Sometimes the proposal is only partially specified. Jiras are often ignored. Understanding a proposal and it's implications through a series of threads in the jira comments is difficult. It's hard for contributors and users to find these top-level jiras and follow their status. I'd like to propose that core Hadoop adopts something similar to Python's PEP (Python Enhancement Proposal) [1]. A "HEP" would be a single primary mechanism for proposing new features, incorporating community feedback, and recording decisions. The author of the HEP would be responsible for building consensus and moving the feature forward. Similarly, some subset of the community would be responsible for reviewing HEPs in a timely manner and identifying missing pieces in the proposal. Discussion would occur before patches showed up on jira. People interested in the core Hadoop roadmap could keep an eye on the HEPs without the overhead of following jira traffic. Why base this on the PEP? The format has proven useful to a substantial existing project, and I think the workflow is not too heavy-weight, and well-suited to a community such as ours. That being said, we could discuss other models (eg Java's JSR). Before we get into specifics, is this something the community would like to adopt in some form? Does adapting the PEP and its workflow to our projects, community and bylaws seem reasonable? Thanks, Eli 1. http://www.python.org/dev/peps/pep-0001 +
Eli Collins 2010-05-21, 20:42
-
Re: [DISCUSSION] Proposal for making core Hadoop changesKonstantin Shvachko 2010-06-29, 00:50
Eli,
Just checking on the status of this proposal. In the past I was hesitant about introducing more formalities. I now think we really need some mechanism for new feature and project proposals, also tracking decisions. For the reasons exactly as you describe in your email. Whether it is going to be HEP or something else, it is best if we adopt it soon. Thanks, --Konstantin On 5/21/2010 1:42 PM, Eli Collins wrote: > As HDFS and MapReduce have matured the cost and complexity of > introducing features has grown. Each new feature has to consider > interactions with a growing set of existing features, a growing user > base (upgrades, backwards compatibility) and additional use cases > (more and more projects now build on them). At the same time we don't > want the high bar for contribution to unnecessarily hinder new > development and releases. > > Many projects at a similar stage address this by adopting a more > formal way to describe, socialize and shepherd enhancements to their > platforms. Today, new features are often discussed via an umbrella > jira, which may have an attached design document. There are a number > of issues with this approach. The design documents vary in format and > quality, and are often reviewed by a limited audience. They aren't > version controlled. Sometimes the proposal is only partially > specified. Jiras are often ignored. Understanding a proposal and it's > implications through a series of threads in the jira comments is > difficult. It's hard for contributors and users to find these > top-level jiras and follow their status. > > I'd like to propose that core Hadoop adopts something similar to > Python's PEP (Python Enhancement Proposal) [1]. A "HEP" would be a > single primary mechanism for proposing new features, incorporating > community feedback, and recording decisions. The author of the HEP > would be responsible for building consensus and moving the feature > forward. Similarly, some subset of the community would be responsible > for reviewing HEPs in a timely manner and identifying missing pieces > in the proposal. Discussion would occur before patches showed up on > jira. People interested in the core Hadoop roadmap could keep an eye > on the HEPs without the overhead of following jira traffic. > > Why base this on the PEP? The format has proven useful to a > substantial existing project, and I think the workflow is not too > heavy-weight, and well-suited to a community such as ours. That being > said, we could discuss other models (eg Java's JSR). > > Before we get into specifics, is this something the community would > like to adopt in some form? Does adapting the PEP and its workflow to > our projects, community and bylaws seem reasonable? > > Thanks, > Eli > > 1. http://www.python.org/dev/peps/pep-0001 > +
Konstantin Shvachko 2010-06-29, 00:50
-
Re: [DISCUSSION] Proposal for making core Hadoop changesEli Collins 2010-06-29, 03:03
Hey Konstantin,
Apologies for the delay, busy with stuff for the summit. I'll get a concrete proposal to general based on our discussion at the contributor's meeting out this week. Thanks, Eli On Mon, Jun 28, 2010 at 5:50 PM, Konstantin Shvachko <[EMAIL PROTECTED]> wrote: > Eli, > > Just checking on the status of this proposal. > > In the past I was hesitant about introducing more formalities. > I now think we really need some mechanism for > new feature and project proposals, also tracking decisions. > For the reasons exactly as you describe in your email. > Whether it is going to be HEP or something else, it is best > if we adopt it soon. > > Thanks, > --Konstantin > > > On 5/21/2010 1:42 PM, Eli Collins wrote: >> >> As HDFS and MapReduce have matured the cost and complexity of >> introducing features has grown. Each new feature has to consider >> interactions with a growing set of existing features, a growing user >> base (upgrades, backwards compatibility) and additional use cases >> (more and more projects now build on them). At the same time we don't >> want the high bar for contribution to unnecessarily hinder new >> development and releases. >> >> Many projects at a similar stage address this by adopting a more >> formal way to describe, socialize and shepherd enhancements to their >> platforms. Today, new features are often discussed via an umbrella >> jira, which may have an attached design document. There are a number >> of issues with this approach. The design documents vary in format and >> quality, and are often reviewed by a limited audience. They aren't >> version controlled. Sometimes the proposal is only partially >> specified. Jiras are often ignored. Understanding a proposal and it's >> implications through a series of threads in the jira comments is >> difficult. It's hard for contributors and users to find these >> top-level jiras and follow their status. >> >> I'd like to propose that core Hadoop adopts something similar to >> Python's PEP (Python Enhancement Proposal) [1]. A "HEP" would be a >> single primary mechanism for proposing new features, incorporating >> community feedback, and recording decisions. The author of the HEP >> would be responsible for building consensus and moving the feature >> forward. Similarly, some subset of the community would be responsible >> for reviewing HEPs in a timely manner and identifying missing pieces >> in the proposal. Discussion would occur before patches showed up on >> jira. People interested in the core Hadoop roadmap could keep an eye >> on the HEPs without the overhead of following jira traffic. >> >> Why base this on the PEP? The format has proven useful to a >> substantial existing project, and I think the workflow is not too >> heavy-weight, and well-suited to a community such as ours. That being >> said, we could discuss other models (eg Java's JSR). >> >> Before we get into specifics, is this something the community would >> like to adopt in some form? Does adapting the PEP and its workflow to >> our projects, community and bylaws seem reasonable? >> >> Thanks, >> Eli >> >> 1. http://www.python.org/dev/peps/pep-0001 >> > > +
Eli Collins 2010-06-29, 03:03
-
Re: [DISCUSSION] Proposal for making core Hadoop changesBernd Fondermann 2010-06-29, 15:29
On Tue, Jun 29, 2010 at 02:50, Konstantin Shvachko <[EMAIL PROTECTED]> wrote:
> Eli, > > Just checking on the status of this proposal. > > In the past I was hesitant about introducing more formalities. > I now think we really need some mechanism for > new feature and project proposals, also tracking decisions. Making and tracking decisions at Apache is done via public ASF mailing lists, exclusively. Any other means of communication, including face-to-face, JIRA, IRC etc, is *not binding*. Every community member has equal say (only PMC members votes are binding though). Committers can veto commits and commit to svn. PMC members have special rights and duties, too, as described in our Bylaws. That's about it. If Hadoop has issues tracking and making decisions, you won't fix that by introducing any formalities. Bernd +
Bernd Fondermann 2010-06-29, 15:29
-
Re: [DISCUSSION] Proposal for making core Hadoop changesJay Booth 2010-06-29, 17:11
Well, if people decide that some system more organized than email
threads is better to keep track of major project proposals, it may help with some aspects of the project. Or it may not. The fact that a pro forma vote by email may also be required at some points to make something "official" shouldn't be a major reason against such a system, if it's otherwise a good idea. On Tue, Jun 29, 2010 at 11:29 AM, Bernd Fondermann <[EMAIL PROTECTED]> wrote: > On Tue, Jun 29, 2010 at 02:50, Konstantin Shvachko <[EMAIL PROTECTED]> wrote: >> Eli, >> >> Just checking on the status of this proposal. >> >> In the past I was hesitant about introducing more formalities. >> I now think we really need some mechanism for >> new feature and project proposals, also tracking decisions. > > Making and tracking decisions at Apache is done via public ASF mailing > lists, exclusively. > Any other means of communication, including face-to-face, JIRA, IRC > etc, is *not binding*. > Every community member has equal say (only PMC members votes are > binding though). > Committers can veto commits and commit to svn. PMC members have > special rights and duties, too, as described in our Bylaws. > > That's about it. > > If Hadoop has issues tracking and making decisions, you won't fix that > by introducing any formalities. > > Bernd > +
Jay Booth 2010-06-29, 17:11
-
Re: [DISCUSSION] Proposal for making core Hadoop changesBernd Fondermann 2010-06-29, 18:04
On Tue, Jun 29, 2010 at 19:11, Jay Booth <[EMAIL PROTECTED]> wrote:
> Well, if people decide that some system more organized than email > threads is better to keep track of major project proposals, it may > help with some aspects of the project. As long as this system is under the control of our infra team, this is ok. > Or it may not. The fact that > a pro forma vote by email may also be required at some points to make > something "official" shouldn't be a major reason against such a > system, if it's otherwise a good idea. Discussions must also take place on-list. There is no such thing as "pro forma" on-list activity. Bernd +
Bernd Fondermann 2010-06-29, 18:04
-
Re: [DISCUSSION] Proposal for making core Hadoop changesEli Collins 2010-06-29, 18:02
On Tuesday, June 29, 2010, Bernd Fondermann
<[EMAIL PROTECTED]> wrote: > On Tue, Jun 29, 2010 at 02:50, Konstantin Shvachko <[EMAIL PROTECTED]> wrote: >> Eli, >> >> Just checking on the status of this proposal. >> >> In the past I was hesitant about introducing more formalities. >> I now think we really need some mechanism for >> new feature and project proposals, also tracking decisions. > > Making and tracking decisions at Apache is done via public ASF mailing > lists, exclusively. All proposals will be discussed and voted on the public lists. Per the original mail the proposal must be compatible with current bylaws. Thanks, Eli +
Eli Collins 2010-06-29, 18:02
-
Re: [DISCUSSION] Proposal for making core Hadoop changesBernd Fondermann 2010-06-29, 18:10
On Tue, Jun 29, 2010 at 20:02, Eli Collins <[EMAIL PROTECTED]> wrote:
> On Tuesday, June 29, 2010, Bernd Fondermann > <[EMAIL PROTECTED]> wrote: >> On Tue, Jun 29, 2010 at 02:50, Konstantin Shvachko <[EMAIL PROTECTED]> wrote: >>> Eli, >>> >>> Just checking on the status of this proposal. >>> >>> In the past I was hesitant about introducing more formalities. >>> I now think we really need some mechanism for >>> new feature and project proposals, also tracking decisions. >> >> Making and tracking decisions at Apache is done via public ASF mailing >> lists, exclusively. > > All proposals will be discussed and voted on the public lists. Per > the original mail the proposal must be compatible with current bylaws. Thanks for the clarification. Bernd +
Bernd Fondermann 2010-06-29, 18:10
-
Re: [DISCUSSION] Proposal for making core Hadoop changesAmr Awadallah 2010-05-22, 02:26
> Does adapting the PEP and its workflow to our projects, community and
bylaws seem reasonable? +1 On 5/21/2010 1:42 PM, Eli Collins wrote: > As HDFS and MapReduce have matured the cost and complexity of > introducing features has grown. Each new feature has to consider > interactions with a growing set of existing features, a growing user > base (upgrades, backwards compatibility) and additional use cases > (more and more projects now build on them). At the same time we don't > want the high bar for contribution to unnecessarily hinder new > development and releases. > > Many projects at a similar stage address this by adopting a more > formal way to describe, socialize and shepherd enhancements to their > platforms. Today, new features are often discussed via an umbrella > jira, which may have an attached design document. There are a number > of issues with this approach. The design documents vary in format and > quality, and are often reviewed by a limited audience. They aren't > version controlled. Sometimes the proposal is only partially > specified. Jiras are often ignored. Understanding a proposal and it's > implications through a series of threads in the jira comments is > difficult. It's hard for contributors and users to find these > top-level jiras and follow their status. > > I'd like to propose that core Hadoop adopts something similar to > Python's PEP (Python Enhancement Proposal) [1]. A "HEP" would be a > single primary mechanism for proposing new features, incorporating > community feedback, and recording decisions. The author of the HEP > would be responsible for building consensus and moving the feature > forward. Similarly, some subset of the community would be responsible > for reviewing HEPs in a timely manner and identifying missing pieces > in the proposal. Discussion would occur before patches showed up on > jira. People interested in the core Hadoop roadmap could keep an eye > on the HEPs without the overhead of following jira traffic. > > Why base this on the PEP? The format has proven useful to a > substantial existing project, and I think the workflow is not too > heavy-weight, and well-suited to a community such as ours. That being > said, we could discuss other models (eg Java's JSR). > > Before we get into specifics, is this something the community would > like to adopt in some form? Does adapting the PEP and its workflow to > our projects, community and bylaws seem reasonable? > > Thanks, > Eli > > 1. http://www.python.org/dev/peps/pep-0001 > +
Amr Awadallah 2010-05-22, 02:26
-
Re: [DISCUSSION] Proposal for making core Hadoop changesNigel Daley 2010-05-22, 20:08
+1 to better process around feature enhancements. I like that PEP also
includes process enhancements too. For comparison, anyone have a references to similar processes? Cheers, Nige On May 21, 2010, at 1:42 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > As HDFS and MapReduce have matured the cost and complexity of > introducing features has grown. Each new feature has to consider > interactions with a growing set of existing features, a growing user > base (upgrades, backwards compatibility) and additional use cases > (more and more projects now build on them). At the same time we don't > want the high bar for contribution to unnecessarily hinder new > development and releases. > > Many projects at a similar stage address this by adopting a more > formal way to describe, socialize and shepherd enhancements to their > platforms. Today, new features are often discussed via an umbrella > jira, which may have an attached design document. There are a number > of issues with this approach. The design documents vary in format and > quality, and are often reviewed by a limited audience. They aren't > version controlled. Sometimes the proposal is only partially > specified. Jiras are often ignored. Understanding a proposal and it's > implications through a series of threads in the jira comments is > difficult. It's hard for contributors and users to find these > top-level jiras and follow their status. > > I'd like to propose that core Hadoop adopts something similar to > Python's PEP (Python Enhancement Proposal) [1]. A "HEP" would be a > single primary mechanism for proposing new features, incorporating > community feedback, and recording decisions. The author of the HEP > would be responsible for building consensus and moving the feature > forward. Similarly, some subset of the community would be responsible > for reviewing HEPs in a timely manner and identifying missing pieces > in the proposal. Discussion would occur before patches showed up on > jira. People interested in the core Hadoop roadmap could keep an eye > on the HEPs without the overhead of following jira traffic. > > Why base this on the PEP? The format has proven useful to a > substantial existing project, and I think the workflow is not too > heavy-weight, and well-suited to a community such as ours. That being > said, we could discuss other models (eg Java's JSR). > > Before we get into specifics, is this something the community would > like to adopt in some form? Does adapting the PEP and its workflow to > our projects, community and bylaws seem reasonable? > > Thanks, > Eli > > 1. http://www.python.org/dev/peps/pep-0001 +
Nigel Daley 2010-05-22, 20:08
-
Re: [DISCUSSION] Proposal for making core Hadoop changesJeff Hammerbacher 2010-05-25, 08:09
> For comparison, anyone have a references to similar processes?
> Java has the Java Community Process: http://jcp.org/en/home/index +
Jeff Hammerbacher 2010-05-25, 08:09
-
Re: [DISCUSSION] Proposal for making core Hadoop changesSteve Loughran 2010-05-25, 16:28
Jeff Hammerbacher wrote:
>> For comparison, anyone have a references to similar processes? >> > > Java has the Java Community Process: http://jcp.org/en/home/index > a process that nobody liked, such as this comment by GregW of the jetty team on JSP 3 http://blogs.webtide.com/gregw/entry/servlet_3_0_public_review JCP has some advantage over standards bodies I've been in * they recognise the value of tests. * better remote collaboration * more open to interested third parties But that's it. Very vendor-managed, Sun was usually in charge, you'd be hard pressed to find anyone on the Apache jcp-open list (yes, we have one!) who is happy. I haven't looked at Elliot's proposal in enough detail to comment, here are my thoughts from working on the lifecycle stuff, and on other ASF projects * evolution in the codebase is a good way of getting stuff to meet people's needs. If you have to have big branches until things are perfect you have the cost of maintaining branches, its harder for people to experiment with your stuff. * If the cost of adding features is high -and maintaining branches, merging, identifying test failures is high- the barrier to participation is pretty steep. you need a team of engineers to work on every feature -steve +
Steve Loughran 2010-05-25, 16:28
-
Re: [DISCUSSION] Proposal for making core Hadoop changesEli Collins 2010-05-25, 19:42
On Tue, May 25, 2010 at 9:28 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> Jeff Hammerbacher wrote: >>> >>> For comparison, anyone have a references to similar processes? >>> >> >> Java has the Java Community Process: http://jcp.org/en/home/index >> > > a process that nobody liked, such as this comment by GregW of the jetty team > on JSP 3 > http://blogs.webtide.com/gregw/entry/servlet_3_0_public_review > > JCP has some advantage over standards bodies I've been in > * they recognise the value of tests. > * better remote collaboration > * more open to interested third parties > But that's it. Very vendor-managed, Sun was usually in charge, you'd be hard > pressed to find anyone on the Apache jcp-open list (yes, we have one!) who > is happy. The JCP seems heavy weight, we'll want to make sure the pendulum doesn't swing too far in the opposite direction. Would be interesting if there are other good light weight alternatives to the PEP, I looked and didn't turn up many. > * evolution in the codebase is a good way of getting stuff to meet people's > needs. If you have to have big branches until things are perfect you have > the cost of maintaining branches, its harder for people to experiment with > your stuff. > * If the cost of adding features is high -and maintaining branches, merging, > identifying test failures is high- the barrier to participation is pretty > steep. you need a team of engineers to work on every feature The cost of adding features has gotten high anyway (even without branching). It's a classic trade-off -- merge overhead vs moving faster without burdening others -- as the overhead imposed on others increases, and tools (git) make it easier to live and collaborate on branches it makes more sense (you don't need a team of engineers or dedicated merge engineer to maintain the branch). Might find the following interesting: http://incubator.apache.org/learn/rules-for-revolutionaries.html Thanks, Eli +
Eli Collins 2010-05-25, 19:42
-
Re: [DISCUSSION] Proposal for making core Hadoop changesSteve Loughran 2010-05-26, 10:24
Eli Collins wrote:
> The cost of adding features has gotten high anyway (even without > branching). It's a classic trade-off -- merge overhead vs moving > faster without burdening others -- as the overhead imposed on others > increases, and tools (git) make it easier to live and collaborate on > branches it makes more sense maybe, but if you are trying to keep >1 branch in sync, all the low cost refactorings become expensive to perform -renaming variables -hitting the reformat-code button to align the code with the project layout rules -moving methods around Life is simplest if you own the entire codebase and can move stuff around without any discussion. Closed source projects can do that, but even then it annoys other team members. In any OSS project, keeping stuff more stable makes it easier to take in third party patches, and ensures that stack traces from various versions all point to roughly the same code, always handy. Once you try to keep multiple branches alive, it becomes very hard to do big changes in trunk. >(you don't need a team of engineers or > dedicated merge engineer to maintain the branch). No, but I'd estimate the cost of merging at 1-2 days work a week just to pull in the code *and identify why the tests are failing*. Git may be better at merging in changes, but if Hadoop doesn't work on my machine after the merge, I need to identify whether its my code, the merged code, some machine quirk, etc. It's the testing that is the problem for me, not the merge effort. That's the Hadoop own tests any my own functional test suites, the ones that bring up clusters and push work through. Those are the troublespots, as they do things that hadoop's own tests don't do, like as for all the JSP pages. > Might find the > following interesting: > http://incubator.apache.org/learn/rules-for-revolutionaries.html There's a long story behind JDD's paper, I'm glad you have read it, it does lay out what is effectively the ASF process for effecting significant change -but it doesn't imply that's the only process for having changes. One of the big issues that in any successful project it becomes hard to do a big rewrite, and you end up with what was done early on, despite known issues. The "Some Thoughts on Ant 1.3 and 2.0" discussion is related to this we -and I wasn't a committer at this time, just a user- weren't able to do the big rework so we are left today with the design errors of the past (like the way undefined properties just get retained as ${undefined.property} instead of some kind of error appearing): http://www.mail-archive.com/[EMAIL PROTECTED]/msg05984.html I think gradual evolution in trunk is good, it lets people play with what's coming in. Having lots of separate branches and everyone's private release being a merge of many patches that you choose is bad. Because it means my version != your version != anyone else's, which implies that your tests mean nothing to me unless I also test at scale. Which I can do, but with different hardware and network configs from other people, it's still tricky to assign blame. Is it my merge that isn't working, is it some quirk of virtualisation underneath, or is it just this week's trunk playing up? +
Steve Loughran 2010-05-26, 10:24
-
Re: [DISCUSSION] Proposal for making core Hadoop changesEli Collins 2010-05-26, 16:13
> No, but I'd estimate the cost of merging at 1-2 days work a week just to
> pull in the code *and identify why the tests are failing*. Git may be better > at merging in changes, but if Hadoop doesn't work on my machine after the > merge, I need to identify whether its my code, the merged code, some machine > quirk, etc. It's the testing that is the problem for me, not the > merge effort. That's the Hadoop own tests any my own functional test suites, > the ones that bring up clusters and push work through. Those are the > troublespots, as they do things that hadoop's own tests don't do, like as > for all the JSP pages. I've lived off a git branch of common/hdfs for half a year with a big uncommitted patch, it's no where near 1-2 days of effort per week to merge in changes from trunk. If the tests are passing on trunk, and they fail after your merge then those are real test failures due to your change (and therefore should require effort). The issues with your internal tests failing due to changes on trunk is the same whether you merge or you just do an update - you have to update before checking in the patch anyway - so that issue is about the state of trunk when you merge or update, rather than about being on a branch. > >> Might find the >> following interesting: >> http://incubator.apache.org/learn/rules-for-revolutionaries.html > > There's a long story behind JDD's paper, I'm glad you have read it, it does > lay out what is effectively the ASF process for effecting significant change > -but it doesn't imply that's the only process for having changes. > Just to be clear I don't mean imply that branches are the only process for making changes. Interesting that this is considered the effective ASF process, it hasn't seemed to me that recent big features on hadoop have used it, only one I'm aware of that was done on a branch was append. > I think gradual evolution in trunk is good, it lets people play with what's > coming in. Having lots of separate branches and everyone's private release > being a merge of many patches that you choose is bad. Agreed. Personally I don't think people should release from branches. And in practice I don't think you'll see lots of branches, people can and would still develop on trunk. Getting changes merged from a branch back to trunk before the whole branch is merged is a good thing, the whole branch may never be merged and that's OK too. Branches are a mechanism, releases are policy. Thanks, Eli +
Eli Collins 2010-05-26, 16:13
-
Re: [DISCUSSION] Proposal for making core Hadoop changesJeff Hammerbacher 2010-05-31, 17:16
A far more lightweight example of multi-issue feature planning in an open
source project comes from Drizzle and their "blueprints": https://blueprints.launchpad.net/drizzle. Each "spec" has a drafter, an approver, and an assignee; declares the other specs on which it depends; points to the relevant branches in the source tree and issues in the issue tracker; and has a priority, definition state, and implementation state. I don't know how it's working out for them in practice, but on paper it looks quite nice. On Wed, May 26, 2010 at 9:13 AM, Eli Collins <[EMAIL PROTECTED]> wrote: > > No, but I'd estimate the cost of merging at 1-2 days work a week just to > > pull in the code *and identify why the tests are failing*. Git may be > better > > at merging in changes, but if Hadoop doesn't work on my machine after the > > merge, I need to identify whether its my code, the merged code, some > machine > > quirk, etc. It's the testing that is the problem for me, not the > > merge effort. That's the Hadoop own tests any my own functional test > suites, > > the ones that bring up clusters and push work through. Those are the > > troublespots, as they do things that hadoop's own tests don't do, like as > > for all the JSP pages. > > I've lived off a git branch of common/hdfs for half a year with a big > uncommitted patch, it's no where near 1-2 days of effort per week to > merge in changes from trunk. If the tests are passing on trunk, and > they fail after your merge then those are real test failures due to > your change (and therefore should require effort). The issues with > your internal tests failing due to changes on trunk is the same > whether you merge or you just do an update - you have to update before > checking in the patch anyway - so that issue is about the state of > trunk when you merge or update, rather than about being on a branch. > > > > >> Might find the > >> following interesting: > >> http://incubator.apache.org/learn/rules-for-revolutionaries.html > > > > There's a long story behind JDD's paper, I'm glad you have read it, it > does > > lay out what is effectively the ASF process for effecting significant > change > > -but it doesn't imply that's the only process for having changes. > > > > Just to be clear I don't mean imply that branches are the only process > for making changes. Interesting that this is considered the effective > ASF process, it hasn't seemed to me that recent big features on hadoop > have used it, only one I'm aware of that was done on a branch was > append. > > > I think gradual evolution in trunk is good, it lets people play with > what's > > coming in. Having lots of separate branches and everyone's private > release > > being a merge of many patches that you choose is bad. > > Agreed. Personally I don't think people should release from branches. > And in practice I don't think you'll see lots of branches, people can > and would still develop on trunk. Getting changes merged from a branch > back to trunk before the whole branch is merged is a good thing, the > whole branch may never be merged and that's OK too. Branches are a > mechanism, releases are policy. > > Thanks, > Eli > +
Jeff Hammerbacher 2010-05-31, 17:16
-
Re: [DISCUSSION] Proposal for making core Hadoop changesEli Collins 2010-06-01, 21:28
Hey Jeff,
Blueprints (it's a launchpad thing) is more of an issue tracking system (launchpad doesn't put features/enhancements in their bug database), eg drizzle has lots of blueprints, and blueprints for cleaning up code, adding config flags, etc. We'll use jira for that kind of stuff, the HEP is for larger stuff that needs more upfront discussion. Thanks, Eli On Mon, May 31, 2010 at 10:16 AM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote: > A far more lightweight example of multi-issue feature planning in an open > source project comes from Drizzle and their "blueprints": > https://blueprints.launchpad.net/drizzle. > > Each "spec" has a drafter, an approver, and an assignee; declares the other > specs on which it depends; points to the relevant branches in the source > tree and issues in the issue tracker; and has a priority, definition state, > and implementation state. > > I don't know how it's working out for them in practice, but on paper it > looks quite nice. > > On Wed, May 26, 2010 at 9:13 AM, Eli Collins <[EMAIL PROTECTED]> wrote: > >> > No, but I'd estimate the cost of merging at 1-2 days work a week just to >> > pull in the code *and identify why the tests are failing*. Git may be >> better >> > at merging in changes, but if Hadoop doesn't work on my machine after the >> > merge, I need to identify whether its my code, the merged code, some >> machine >> > quirk, etc. It's the testing that is the problem for me, not the >> > merge effort. That's the Hadoop own tests any my own functional test >> suites, >> > the ones that bring up clusters and push work through. Those are the >> > troublespots, as they do things that hadoop's own tests don't do, like as >> > for all the JSP pages. >> >> I've lived off a git branch of common/hdfs for half a year with a big >> uncommitted patch, it's no where near 1-2 days of effort per week to >> merge in changes from trunk. If the tests are passing on trunk, and >> they fail after your merge then those are real test failures due to >> your change (and therefore should require effort). The issues with >> your internal tests failing due to changes on trunk is the same >> whether you merge or you just do an update - you have to update before >> checking in the patch anyway - so that issue is about the state of >> trunk when you merge or update, rather than about being on a branch. >> >> > >> >> Might find the >> >> following interesting: >> >> http://incubator.apache.org/learn/rules-for-revolutionaries.html >> > >> > There's a long story behind JDD's paper, I'm glad you have read it, it >> does >> > lay out what is effectively the ASF process for effecting significant >> change >> > -but it doesn't imply that's the only process for having changes. >> > >> >> Just to be clear I don't mean imply that branches are the only process >> for making changes. Interesting that this is considered the effective >> ASF process, it hasn't seemed to me that recent big features on hadoop >> have used it, only one I'm aware of that was done on a branch was >> append. >> >> > I think gradual evolution in trunk is good, it lets people play with >> what's >> > coming in. Having lots of separate branches and everyone's private >> release >> > being a merge of many patches that you choose is bad. >> >> Agreed. Personally I don't think people should release from branches. >> And in practice I don't think you'll see lots of branches, people can >> and would still develop on trunk. Getting changes merged from a branch >> back to trunk before the whole branch is merged is a good thing, the >> whole branch may never be merged and that's OK too. Branches are a >> mechanism, releases are policy. >> >> Thanks, >> Eli >> > +
Eli Collins 2010-06-01, 21:28
-
Re: [DISCUSSION] Proposal for making core Hadoop changesJeff Hammerbacher 2010-06-03, 05:45
Sure, each project can choose to use the framework in the way they see fit
on Launchpad. I wanted to call out their use of metadata as being particularly nice. We may want to consider similar fields and applications of those fields for HEPs. On Tue, Jun 1, 2010 at 2:28 PM, Eli Collins <[EMAIL PROTECTED]> wrote: > Hey Jeff, > > Blueprints (it's a launchpad thing) is more of an issue tracking > system (launchpad doesn't put features/enhancements in their bug > database), eg drizzle has lots of blueprints, and blueprints for > cleaning up code, adding config flags, etc. We'll use jira for that > kind of stuff, the HEP is for larger stuff that needs more upfront > discussion. > > Thanks, > Eli > > On Mon, May 31, 2010 at 10:16 AM, Jeff Hammerbacher <[EMAIL PROTECTED]> > wrote: > > A far more lightweight example of multi-issue feature planning in an open > > source project comes from Drizzle and their "blueprints": > > https://blueprints.launchpad.net/drizzle. > > > > Each "spec" has a drafter, an approver, and an assignee; declares the > other > > specs on which it depends; points to the relevant branches in the source > > tree and issues in the issue tracker; and has a priority, definition > state, > > and implementation state. > > > > I don't know how it's working out for them in practice, but on paper it > > looks quite nice. > > > > On Wed, May 26, 2010 at 9:13 AM, Eli Collins <[EMAIL PROTECTED]> wrote: > > > >> > No, but I'd estimate the cost of merging at 1-2 days work a week just > to > >> > pull in the code *and identify why the tests are failing*. Git may be > >> better > >> > at merging in changes, but if Hadoop doesn't work on my machine after > the > >> > merge, I need to identify whether its my code, the merged code, some > >> machine > >> > quirk, etc. It's the testing that is the problem for me, not the > >> > merge effort. That's the Hadoop own tests any my own functional test > >> suites, > >> > the ones that bring up clusters and push work through. Those are the > >> > troublespots, as they do things that hadoop's own tests don't do, like > as > >> > for all the JSP pages. > >> > >> I've lived off a git branch of common/hdfs for half a year with a big > >> uncommitted patch, it's no where near 1-2 days of effort per week to > >> merge in changes from trunk. If the tests are passing on trunk, and > >> they fail after your merge then those are real test failures due to > >> your change (and therefore should require effort). The issues with > >> your internal tests failing due to changes on trunk is the same > >> whether you merge or you just do an update - you have to update before > >> checking in the patch anyway - so that issue is about the state of > >> trunk when you merge or update, rather than about being on a branch. > >> > >> > > >> >> Might find the > >> >> following interesting: > >> >> http://incubator.apache.org/learn/rules-for-revolutionaries.html > >> > > >> > There's a long story behind JDD's paper, I'm glad you have read it, it > >> does > >> > lay out what is effectively the ASF process for effecting significant > >> change > >> > -but it doesn't imply that's the only process for having changes. > >> > > >> > >> Just to be clear I don't mean imply that branches are the only process > >> for making changes. Interesting that this is considered the effective > >> ASF process, it hasn't seemed to me that recent big features on hadoop > >> have used it, only one I'm aware of that was done on a branch was > >> append. > >> > >> > I think gradual evolution in trunk is good, it lets people play with > >> what's > >> > coming in. Having lots of separate branches and everyone's private > >> release > >> > being a merge of many patches that you choose is bad. > >> > >> Agreed. Personally I don't think people should release from branches. > >> And in practice I don't think you'll see lots of branches, people can > >> and would still develop on trunk. Getting changes merged from a branch > >> back to trunk before the whole branch is merged is a good thing, the +
Jeff Hammerbacher 2010-06-03, 05:45
|