|
Eric Yang
2011-05-05, 02:39
Eli Collins
2011-05-05, 06:31
Tony Valderrama
2011-05-05, 09:51
Steve Loughran
2011-05-05, 11:35
Eric Yang
2011-05-05, 17:02
Eric Yang
2011-05-05, 17:12
Eric Yang
2011-05-05, 17:32
Todd Lipcon
2011-05-05, 17:52
Steve Loughran
2011-05-06, 12:44
Marcos Ortiz
2011-05-06, 14:16
Milind Bhandarkar
2011-05-06, 16:51
Roy T. Fielding
2011-05-07, 05:55
Eric Sammer
2011-05-07, 06:15
Scott Carey
2011-05-10, 18:06
|
-
[DISCUSSION] development process of HadoopEric Yang 2011-05-05, 02:39
If we reflect back and see how the development community end up in its current state for Hadoop. There are development rapidly happening and tested in all kind of organizations. However, Hadoop committers are only committing code that are interested by the sponsored companies. People are coding defensively to ensuring only self serving patches would be committed, and helping others and merging problem are always prioritized secondary. While the world demand agility, the "review then commit" process is preventing progress from happening. Committers are afraid to commit patches because review hasn't took place. By the time patch is reviewed, it does not apply properly. People end up having to generate multiple version of patches to ensure the code can be applied. The large lag time between patch generation and reviewed is taking significant toll on the community and progress.
Yahoo have a great team of developers who improves Hadoop at faster pace with its own fork of the source code. The reason that Yahoo was able to achieve faster improvement with features was due to the ability to use source code repository tools properly. Unfortunate for Yahoo, their source code repository was not Apache svn trunk. I applause Owen and Arun's effort for men powering and backward/forward porting the changes between yahoo github and Apache svn. There might be some jiras that needs to be merged into Hadoop 0.20.203 branch to ensure the linage is correct. The community should offer to help with detail listing of what is missing rather than vote -1 without concise reasoning of what is missing. JIRA is meant as a discussion and collaboration tool, but hadoop community intends to use it as the source code version control system with men powered diff maker. While spending time in the incubator with other project, the mentors have explained that it is not ASF's philosophy to use "review then commit". Hadoop community should rethink if the community is using the right tools for the right task. Use JIRA, if there is large feature set that requires brain storming, and developers should have the ability to make small incremental changes without RTC. This will ensure developers help each other rather than policing each other. Any thoughts? Regards, Eric
-
Re: [DISCUSSION] development process of HadoopEli Collins 2011-05-05, 06:31
On Wed, May 4, 2011 at 7:39 PM, Eric Yang <[EMAIL PROTECTED]> wrote:
> If we reflect back and see how the development community end up in its current state for Hadoop. There are development rapidly happening and tested in all kind of organizations. However, Hadoop committers are only committing code that are interested by the sponsored companies. People are coding defensively to ensuring only self serving patches would be committed, and helping others and merging problem are always prioritized secondary. While the world demand agility, the "review then commit" process is preventing progress from happening. Committers are afraid to commit patches because review hasn't took place. By the time patch is reviewed, it does not apply properly. People end up having to generate multiple version of patches to ensure the code can be applied. The large lag time between patch generation and reviewed is taking significant toll on the community and progress. > > Yahoo have a great team of developers who improves Hadoop at faster pace with its own fork of the source code. The reason that Yahoo was able to achieve faster improvement with features was due to the ability to use source code repository tools properly. Unfortunate for Yahoo, their source code repository was not Apache svn trunk. I applause Owen and Arun's effort for men powering and backward/forward porting the changes between yahoo github and Apache svn. There might be some jiras that needs to be merged into Hadoop 0.20.203 branch to ensure the linage is correct. The community should offer to help with detail listing of what is missing rather than vote -1 without concise reasoning of what is missing. > > JIRA is meant as a discussion and collaboration tool, but hadoop community intends to use it as the source code version control system with men powered diff maker. While spending time in the incubator with other project, the mentors have explained that it is not ASF's philosophy to use "review then commit". ASF's policy is that projects make this decision for themselves: http://www.apache.org/dev/project-creation.html The Hadoop bylaws specify that code changes are lazy consensus, ie you need a +1 from a committer. Technically the code doesn't have to be reviewed before committing it, that's just been the norm. I don't think jira is technically required either, it's just been the norm. The vote for the patch has to happen on the lists, that happens as a side effect of jira traffic going to the dev lists. > Hadoop community should rethink if the community is using the right tools for the right task. > > Use JIRA, if there is large feature set that requires brain storming, and developers should have the ability to make small incremental changes without RTC. This will ensure developers help each other rather than policing each other. > > Any thoughts? > I think you can move quickly with RTC or CTR, I've worked on RTC projects that have moved quickly. It requires people dedicate bandwidth to reviewing changes. If you do want all your code reviewed (at some point) then you're ultimately limited by review bandwidth, with either RTC or CTR. The time it takes to file a jira is normally insignificant compared to the time to create and test a change. The idea with using jira is that you propose/discuss a change before creating code. You could do that on the lists too. I agree using just a code review tool for small stuff would be faster, eg things that don't require a bug #, release note, etc. Thanks, Eli
-
Re: [DISCUSSION] development process of HadoopTony Valderrama 2011-05-05, 09:51
Hi, I just wanted to drop in a few thoughts from a new developer
working outside of the Hadoop developer community. On Wed, May 4, 2011 at 7:39 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > While the world demand agility, the "review then commit" process is preventing progress > from happening. People end up having to generate multiple version of patches to ensure > the code can be applied. The large lag time between patch generation and reviewed > is taking significant toll on the community and progress. > Yahoo have a great team of developers who improves Hadoop at faster pace with its own > fork of the source code. The reason that Yahoo was able to achieve faster improvement with > features was due to the ability to use source code repository tools properly. Unfortunate > for Yahoo, their source code repository was not Apache svn trunk. I agree that the review process is broken. However, the current situation is exactly the result of a lack of adherence to this and other processes. Various subgroups within the community have (intentionally or unintentionally) hijacked the project at different times by avoiding community processes in the interest of agility or commercial benefit, and the result is a highly fragmented project with no clear direction. >From the outside, Hadoop looks like a Yahoo/Cloudera project which occasionally gets an Apache stamp. Given the lack of adherence to processes, as a non-Yahoo/Cloudera developer I have no way of breaking into the development community. Who's going to review or commit patches I submit? And which of the myriad versions should I even be trying to patch against? And given the speed with which undocumented changes are being made, how am I supposed to figure out if my changes are going to be relevant or viable next week? We'd love to contribute back, but it's just not clear that we or other small players have any place within the Hadoop developer community. Here at Tuenti, like various other small-to-midsize Hadoop users, we've just forked 0.20 and devoted a couple of developers to maintaining features that we need. It would be nice to have shiny new features in the Yahoo branch or the Facebook branch or the Cloudera branch or the 0.22 branch (does Hadoop even have a trunk at the moment?), but we'll favor our own stable and familiar branch over the risky and hefty investment required to adopt a branch without clear community support. > Use JIRA, if there is large feature set that requires brain storming, and developers > should have the ability to make small incremental changes without RTC. This will ensure developers > help each other rather than policing each other. As an outsider, JIRA is the only way I've been able to follow the changes to Hadoop's code and guess where the project is heading. Permitting developers to commit without review or documentation will just further exclude anyone who can't walk down the hall and knock on an office door to ask about a commit. Of course, take this with a grain of salt, since I don't claim to be a part of the Hadoop developer community and I don't forsee Tuenti ever playing a major role in the developer community. ~Tony
-
Re: [DISCUSSION] development process of HadoopSteve Loughran 2011-05-05, 11:35
On 05/05/11 10:51, Tony Valderrama wrote:
> Hi, I just wanted to drop in a few thoughts from a new developer > working outside of the Hadoop developer community. > > On Wed, May 4, 2011 at 7:39 PM, Eric Yang<[EMAIL PROTECTED]> wrote: >> While the world demand agility, the "review then commit" process is preventing progress >> from happening. People end up having to generate multiple version of patches to ensure >> the code can be applied. The large lag time between patch generation and reviewed >> is taking significant toll on the community and progress. > >> Yahoo have a great team of developers who improves Hadoop at faster pace with its own >> fork of the source code. The reason that Yahoo was able to achieve faster improvement with >> features was due to the ability to use source code repository tools properly. Unfortunate >> for Yahoo, their source code repository was not Apache svn trunk. > > I agree that the review process is broken. However, the current > situation is exactly the result of a lack of adherence to this and > other processes. Various subgroups within the community have > (intentionally or unintentionally) hijacked the project at different > times by avoiding community processes in the interest of agility or > commercial benefit, and the result is a highly fragmented project with > no clear direction. > > From the outside, Hadoop looks like a Yahoo/Cloudera project which > occasionally gets an Apache stamp. Given the lack of adherence to > processes, as a non-Yahoo/Cloudera developer I have no way of breaking > into the development community. Who's going to review or commit > patches I submit? And which of the myriad versions should I even be > trying to patch against? And given the speed with which undocumented > changes are being made, how am I supposed to figure out if my changes > are going to be relevant or viable next week? We'd love to contribute > back, but it's just not clear that we or other small players have any > place within the Hadoop developer community. As someone who has commit rights but undercommits, here are my issues -I am not full time on hadoop, I have little time to keep my own code up to date, let alone review patches -I am not fully up to date with all the changes or subtleties in what is a big, complicated system -I don't want to break the big systems (Y!, Facebook) by introducing changes that work on my network and my (small, dynamic) clusters but which place limitations on scale. It's why I prefer review by those people who do work on large scale projects. > >> Use JIRA, if there is large feature set that requires brain storming, and developers >> should have the ability to make small incremental changes without RTC. This will ensure developers >> help each other rather than policing each other. > > As an outsider, JIRA is the only way I've been able to follow the > changes to Hadoop's code and guess where the project is heading. > Permitting developers to commit without review or documentation will > just further exclude anyone who can't walk down the hall and knock on > an office door to ask about a commit. I've worked in other ASF projects (Axis) where some large dev teams (IBM) used to make decisions in team meetings and propagate them. It's faster, but less community centric, and when a large dev team (IBM) get re-assigned internally everyone is left not just scrambling to catch up engineering-wise, but also to make sense of big chunks of under-documented code. At least the JIRA-based review process not only provides a discussion log, Hudson/Jenkins checks that there are tests, no extra warnings, etc. What could be interesting would be -a move to Git to make it easier to pull in patches from other branches, and for people like Tony to have their own fork under SCM. -adoption of Gerrit for having each JIRA issue move from being a patch to a branch (local or remote), so that people can develop the code for an issue, others can pull it in and merge it, and so that the issue tracks live code, not dead patches -more testing of trunk in bigger real/virtual clusters I don't know how we can do this, I'd love to hear about experiences others have with such a process.
-
Re: [DISCUSSION] development process of HadoopEric Yang 2011-05-05, 17:02
Instead of depending on review then commit practice being the norm, Hadoop committers can probably take advantage of the svn jira plugin. People can actively commit to svn as long as a jira number is reference in the commit. The commit message will show up in JIRA and leave a trail of activities for reference. Future committers can refer back to the code history to see why the code is written the way it did. It is less error prone to maintain patch increments. This seems like a solvable problem by tweaking the behaviors of the hadoop committers.
Regards, Eric On 5/4/11 11:31 PM, "Eli Collins" <[EMAIL PROTECTED]> wrote: On Wed, May 4, 2011 at 7:39 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > If we reflect back and see how the development community end up in its current state for Hadoop. There are development rapidly happening and tested in all kind of organizations. However, Hadoop committers are only committing code that are interested by the sponsored companies. People are coding defensively to ensuring only self serving patches would be committed, and helping others and merging problem are always prioritized secondary. While the world demand agility, the "review then commit" process is preventing progress from happening. Committers are afraid to commit patches because review hasn't took place. By the time patch is reviewed, it does not apply properly. People end up having to generate multiple version of patches to ensure the code can be applied. The large lag time between patch generation and reviewed is taking significant toll on the community and progress. > > Yahoo have a great team of developers who improves Hadoop at faster pace with its own fork of the source code. The reason that Yahoo was able to achieve faster improvement with features was due to the ability to use source code repository tools properly. Unfortunate for Yahoo, their source code repository was not Apache svn trunk. I applause Owen and Arun's effort for men powering and backward/forward porting the changes between yahoo github and Apache svn. There might be some jiras that needs to be merged into Hadoop 0.20.203 branch to ensure the linage is correct. The community should offer to help with detail listing of what is missing rather than vote -1 without concise reasoning of what is missing. > > JIRA is meant as a discussion and collaboration tool, but hadoop community intends to use it as the source code version control system with men powered diff maker. While spending time in the incubator with other project, the mentors have explained that it is not ASF's philosophy to use "review then commit". ASF's policy is that projects make this decision for themselves: http://www.apache.org/dev/project-creation.html The Hadoop bylaws specify that code changes are lazy consensus, ie you need a +1 from a committer. Technically the code doesn't have to be reviewed before committing it, that's just been the norm. I don't think jira is technically required either, it's just been the norm. The vote for the patch has to happen on the lists, that happens as a side effect of jira traffic going to the dev lists. > Hadoop community should rethink if the community is using the right tools for the right task. > > Use JIRA, if there is large feature set that requires brain storming, and developers should have the ability to make small incremental changes without RTC. This will ensure developers help each other rather than policing each other. > > Any thoughts? > I think you can move quickly with RTC or CTR, I've worked on RTC projects that have moved quickly. It requires people dedicate bandwidth to reviewing changes. If you do want all your code reviewed (at some point) then you're ultimately limited by review bandwidth, with either RTC or CTR. The time it takes to file a jira is normally insignificant compared to the time to create and test a change. The idea with using jira is that you propose/discuss a change before creating code. You could do that on the lists too. I agree using just a code review tool for small stuff would be faster, eg things that don't require a bug #, release note, etc. Thanks, Eli
-
Re: [DISCUSSION] development process of HadoopEric Yang 2011-05-05, 17:12
The by-law of +1 from committers is probably the less barrier that Apache could make, to prevent people putting in stuff that might not be compatible with Apache license. I also agree that making a working trunk is probably the most important priority for Hadoop. Once the trunk is working, it will be much easier for Hadoop to grow new developers.
Regards, Eric On 5/5/11 2:51 AM, "Tony Valderrama" <[EMAIL PROTECTED]> wrote: Hi, I just wanted to drop in a few thoughts from a new developer working outside of the Hadoop developer community. On Wed, May 4, 2011 at 7:39 PM, Eric Yang <[EMAIL PROTECTED]> wrote: > While the world demand agility, the "review then commit" process is preventing progress > from happening. People end up having to generate multiple version of patches to ensure > the code can be applied. The large lag time between patch generation and reviewed > is taking significant toll on the community and progress. > Yahoo have a great team of developers who improves Hadoop at faster pace with its own > fork of the source code. The reason that Yahoo was able to achieve faster improvement with > features was due to the ability to use source code repository tools properly. Unfortunate > for Yahoo, their source code repository was not Apache svn trunk. I agree that the review process is broken. However, the current situation is exactly the result of a lack of adherence to this and other processes. Various subgroups within the community have (intentionally or unintentionally) hijacked the project at different times by avoiding community processes in the interest of agility or commercial benefit, and the result is a highly fragmented project with no clear direction. >From the outside, Hadoop looks like a Yahoo/Cloudera project which occasionally gets an Apache stamp. Given the lack of adherence to processes, as a non-Yahoo/Cloudera developer I have no way of breaking into the development community. Who's going to review or commit patches I submit? And which of the myriad versions should I even be trying to patch against? And given the speed with which undocumented changes are being made, how am I supposed to figure out if my changes are going to be relevant or viable next week? We'd love to contribute back, but it's just not clear that we or other small players have any place within the Hadoop developer community. Here at Tuenti, like various other small-to-midsize Hadoop users, we've just forked 0.20 and devoted a couple of developers to maintaining features that we need. It would be nice to have shiny new features in the Yahoo branch or the Facebook branch or the Cloudera branch or the 0.22 branch (does Hadoop even have a trunk at the moment?), but we'll favor our own stable and familiar branch over the risky and hefty investment required to adopt a branch without clear community support. > Use JIRA, if there is large feature set that requires brain storming, and developers > should have the ability to make small incremental changes without RTC. This will ensure developers > help each other rather than policing each other. As an outsider, JIRA is the only way I've been able to follow the changes to Hadoop's code and guess where the project is heading. Permitting developers to commit without review or documentation will just further exclude anyone who can't walk down the hall and knock on an office door to ask about a commit. Of course, take this with a grain of salt, since I don't claim to be a part of the Hadoop developer community and I don't forsee Tuenti ever playing a major role in the developer community. ~Tony
-
Re: [DISCUSSION] development process of HadoopEric Yang 2011-05-05, 17:32
Git is powerful in maintaining different branch of the source code. However, it will only work if the entire community is willing to move to git. Maintaining svn and git hybrid, is a time consuming task that we are paying in full price. Hadoop community should work smarter for the source control. What do people think about fully adopting git instead of svn?
Regards, Eric On 5/5/11 4:35 AM, "Steve Loughran" <[EMAIL PROTECTED]> wrote: On 05/05/11 10:51, Tony Valderrama wrote: > Hi, I just wanted to drop in a few thoughts from a new developer > working outside of the Hadoop developer community. > > On Wed, May 4, 2011 at 7:39 PM, Eric Yang<[EMAIL PROTECTED]> wrote: >> While the world demand agility, the "review then commit" process is preventing progress >> from happening. People end up having to generate multiple version of patches to ensure >> the code can be applied. The large lag time between patch generation and reviewed >> is taking significant toll on the community and progress. > >> Yahoo have a great team of developers who improves Hadoop at faster pace with its own >> fork of the source code. The reason that Yahoo was able to achieve faster improvement with >> features was due to the ability to use source code repository tools properly. Unfortunate >> for Yahoo, their source code repository was not Apache svn trunk. > > I agree that the review process is broken. However, the current > situation is exactly the result of a lack of adherence to this and > other processes. Various subgroups within the community have > (intentionally or unintentionally) hijacked the project at different > times by avoiding community processes in the interest of agility or > commercial benefit, and the result is a highly fragmented project with > no clear direction. > > From the outside, Hadoop looks like a Yahoo/Cloudera project which > occasionally gets an Apache stamp. Given the lack of adherence to > processes, as a non-Yahoo/Cloudera developer I have no way of breaking > into the development community. Who's going to review or commit > patches I submit? And which of the myriad versions should I even be > trying to patch against? And given the speed with which undocumented > changes are being made, how am I supposed to figure out if my changes > are going to be relevant or viable next week? We'd love to contribute > back, but it's just not clear that we or other small players have any > place within the Hadoop developer community. As someone who has commit rights but undercommits, here are my issues -I am not full time on hadoop, I have little time to keep my own code up to date, let alone review patches -I am not fully up to date with all the changes or subtleties in what is a big, complicated system -I don't want to break the big systems (Y!, Facebook) by introducing changes that work on my network and my (small, dynamic) clusters but which place limitations on scale. It's why I prefer review by those people who do work on large scale projects. > >> Use JIRA, if there is large feature set that requires brain storming, and developers >> should have the ability to make small incremental changes without RTC. This will ensure developers >> help each other rather than policing each other. > > As an outsider, JIRA is the only way I've been able to follow the > changes to Hadoop's code and guess where the project is heading. > Permitting developers to commit without review or documentation will > just further exclude anyone who can't walk down the hall and knock on > an office door to ask about a commit. I've worked in other ASF projects (Axis) where some large dev teams (IBM) used to make decisions in team meetings and propagate them. It's faster, but less community centric, and when a large dev team (IBM) get re-assigned internally everyone is left not just scrambling to catch up engineering-wise, but also to make sense of big chunks of under-documented code. At least the JIRA-based review process not only provides a discussion log, Hudson/Jenkins checks that there are tests, no extra warnings, etc. What could be interesting would be -a move to Git to make it easier to pull in patches from other branches, and for people like Tony to have their own fork under SCM. -adoption of Gerrit for having each JIRA issue move from being a patch to a branch (local or remote), so that people can develop the code for an issue, others can pull it in and merge it, and so that the issue tracks live code, not dead patches -more testing of trunk in bigger real/virtual clusters I don't know how we can do this, I'd love to hear about experiences others have with such a process.
-
Re: [DISCUSSION] development process of HadoopTodd Lipcon 2011-05-05, 17:52
On Thu, May 5, 2011 at 10:32 AM, Eric Yang <[EMAIL PROTECTED]> wrote:
> Git is powerful in maintaining different branch of the source code. > However, it will only work if the entire community is willing to move to > git. Maintaining svn and git hybrid, is a time consuming task that we are > paying in full price. Hadoop community should work smarter for the source > control. What do people think about fully adopting git instead of svn? > +1 for Git as a tool. But using git makes it even _more_ important that we have a clearly defined release process that outlines which branches are meant to be released as official artifacts, and what the inclusion criteria for those branches should be. -Todd > On 5/5/11 4:35 AM, "Steve Loughran" <[EMAIL PROTECTED]> wrote: > > On 05/05/11 10:51, Tony Valderrama wrote: > > Hi, I just wanted to drop in a few thoughts from a new developer > > working outside of the Hadoop developer community. > > > > On Wed, May 4, 2011 at 7:39 PM, Eric Yang<[EMAIL PROTECTED]> wrote: > >> While the world demand agility, the "review then commit" process is > preventing progress > >> from happening. People end up having to generate multiple version of > patches to ensure > >> the code can be applied. The large lag time between patch generation > and reviewed > >> is taking significant toll on the community and progress. > > > >> Yahoo have a great team of developers who improves Hadoop at faster pace > with its own > >> fork of the source code. The reason that Yahoo was able to achieve > faster improvement with > >> features was due to the ability to use source code repository tools > properly. Unfortunate > >> for Yahoo, their source code repository was not Apache svn trunk. > > > > I agree that the review process is broken. However, the current > > situation is exactly the result of a lack of adherence to this and > > other processes. Various subgroups within the community have > > (intentionally or unintentionally) hijacked the project at different > > times by avoiding community processes in the interest of agility or > > commercial benefit, and the result is a highly fragmented project with > > no clear direction. > > > > From the outside, Hadoop looks like a Yahoo/Cloudera project which > > occasionally gets an Apache stamp. Given the lack of adherence to > > processes, as a non-Yahoo/Cloudera developer I have no way of breaking > > into the development community. Who's going to review or commit > > patches I submit? And which of the myriad versions should I even be > > trying to patch against? And given the speed with which undocumented > > changes are being made, how am I supposed to figure out if my changes > > are going to be relevant or viable next week? We'd love to contribute > > back, but it's just not clear that we or other small players have any > > place within the Hadoop developer community. > > As someone who has commit rights but undercommits, here are my issues > -I am not full time on hadoop, I have little time to keep my own code > up to date, let alone review patches > -I am not fully up to date with all the changes or subtleties in what > is a big, complicated system > -I don't want to break the big systems (Y!, Facebook) by introducing > changes that work on my network and my (small, dynamic) clusters but > which place limitations on scale. It's why I prefer review by those > people who do work on large scale projects. > > > > >> Use JIRA, if there is large feature set that requires brain storming, > and developers > >> should have the ability to make small incremental changes without RTC. > This will ensure developers > >> help each other rather than policing each other. > > > > As an outsider, JIRA is the only way I've been able to follow the > > changes to Hadoop's code and guess where the project is heading. > > Permitting developers to commit without review or documentation will > > just further exclude anyone who can't walk down the hall and knock on > > an office door to ask about a commit. Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSSION] development process of HadoopSteve Loughran 2011-05-06, 12:44
On 05/05/11 18:52, Todd Lipcon wrote:
> On Thu, May 5, 2011 at 10:32 AM, Eric Yang<[EMAIL PROTECTED]> wrote: > >> Git is powerful in maintaining different branch of the source code. >> However, it will only work if the entire community is willing to move to >> git. Maintaining svn and git hybrid, is a time consuming task that we are >> paying in full price. Hadoop community should work smarter for the source >> control. What do people think about fully adopting git instead of svn? >> > > +1 for Git as a tool. But using git makes it even _more_ important that we > have a clearly defined release process that outlines which branches are > meant to be released as official artifacts, and what the inclusion criteria > for those branches should be. > I'm +0.9995 for git: some bits I like, some bits I don't (it's awful for binary data). And you need more than just a release process locked down, you need the developers understanding following a good process. If you have written down process docs there, I'd love to see them. apache infrastructure are discussing git -what would be best would be to start with a non-critical project, such as one or more of the moved contrib projects (like MR-Unit), so we can see that it, gerrit, etc work well within the Hadoop developer world.
-
Re: [DISCUSSION] development process of HadoopMarcos Ortiz 2011-05-06, 14:16
On 05/06/2011 08:14 AM, Steve Loughran wrote:
> On 05/05/11 18:52, Todd Lipcon wrote: >> On Thu, May 5, 2011 at 10:32 AM, Eric Yang<[EMAIL PROTECTED]> wrote: >> >>> Git is powerful in maintaining different branch of the source code. >>> However, it will only work if the entire community is willing to >>> move to >>> git. Maintaining svn and git hybrid, is a time consuming task that >>> we are >>> paying in full price. Hadoop community should work smarter for the >>> source >>> control. What do people think about fully adopting git instead of svn? >>> >> >> +1 for Git as a tool. But using git makes it even _more_ important >> that we >> have a clearly defined release process that outlines which branches are >> meant to be released as official artifacts, and what the inclusion >> criteria >> for those branches should be. >> > > I'm +0.9995 for git: some bits I like, some bits I don't (it's awful > for binary data). And you need more than just a release process locked > down, you need the developers understanding following a good process. > If you have written down process docs there, I'd love to see them. > > apache infrastructure are discussing git -what would be best would be > to start with a non-critical project, such as one or more of the moved > contrib projects (like MR-Unit), so we can see that it, gerrit, etc > work well within the Hadoop developer world. > > > +1 for Git We migrated from SVN to Git for our completed infrastructure, for many reason: - Git use much less space than SVN, all the changes are in a single .git - Git is awesome for branching - Another great advantage is that there are many developers that know Git, and how the development process can be greatly improved. PostgreSQL, one of my favorites open source projects that I use on my daily work, migrated the development process to Git from CVS. Regards. -- Marcos Lu�s Ort�z Valmaseda Software Engineer (Large-Scaled Distributed Systems) University of Information Sciences, La Habana, Cuba Linux User # 418229 http://about.me/marcosortiz
-
Re: [DISCUSSION] development process of HadoopMilind Bhandarkar 2011-05-06, 16:51
+1 for git.
When (not if) Apache Hadoop switches to git, I would recommend all to consider the branching model beautifully described in http://nvie.com/posts/a-successful-git-branching-model/. - milind -- Milind Bhandarkar [EMAIL PROTECTED] +1-650-776-3167 On 5/6/11 7:16 AM, "Marcos Ortiz" <[EMAIL PROTECTED]> wrote: >On 05/06/2011 08:14 AM, Steve Loughran wrote: >> On 05/05/11 18:52, Todd Lipcon wrote: >>> On Thu, May 5, 2011 at 10:32 AM, Eric Yang<[EMAIL PROTECTED]> wrote: >>> >>>> Git is powerful in maintaining different branch of the source code. >>>> However, it will only work if the entire community is willing to >>>> move to >>>> git. Maintaining svn and git hybrid, is a time consuming task that >>>> we are >>>> paying in full price. Hadoop community should work smarter for the >>>> source >>>> control. What do people think about fully adopting git instead of >>>>svn? >>>> >>> >>> +1 for Git as a tool. But using git makes it even _more_ important >>> that we >>> have a clearly defined release process that outlines which branches are >>> meant to be released as official artifacts, and what the inclusion >>> criteria >>> for those branches should be. >>> >> >> I'm +0.9995 for git: some bits I like, some bits I don't (it's awful >> for binary data). And you need more than just a release process locked >> down, you need the developers understanding following a good process. >> If you have written down process docs there, I'd love to see them. >> >> apache infrastructure are discussing git -what would be best would be >> to start with a non-critical project, such as one or more of the moved >> contrib projects (like MR-Unit), so we can see that it, gerrit, etc >> work well within the Hadoop developer world. >> >> >> >+1 for Git > >We migrated from SVN to Git for our completed infrastructure, for many >reason: >- Git use much less space than SVN, all the changes are in a single .git >- Git is awesome for branching >- Another great advantage is that there are many developers that know >Git, and how the development process can be greatly improved. > >PostgreSQL, one of my favorites open source projects that I use on my >daily work, migrated the development process to Git from CVS. > >Regards. > >-- >Marcos Luís Ortíz Valmaseda > Software Engineer (Large-Scaled Distributed Systems) > University of Information Sciences, > La Habana, Cuba > Linux User # 418229 > http://about.me/marcosortiz >
-
Re: [DISCUSSION] development process of HadoopRoy T. Fielding 2011-05-07, 05:55
Please do not turn this into yet another git discussion. The issues
Hadoop is having, with extensive development on private branches rather than collaboration on a common Apache trunk, is a direct result of how many of the core developers are using git. This project has quickly become the poster boy for antisocial behavior via dvcs. ....Roy
-
Re: [DISCUSSION] development process of HadoopEric Sammer 2011-05-07, 06:15
I think I speak for all the other mrunit committers when I say we're
happy to be the guinea pigs on this. On May 6, 2011, at 5:45 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: > On 05/05/11 18:52, Todd Lipcon wrote: >> On Thu, May 5, 2011 at 10:32 AM, Eric Yang<[EMAIL PROTECTED]> wrote: >> >>> Git is powerful in maintaining different branch of the source code. >>> However, it will only work if the entire community is willing to move to >>> git. Maintaining svn and git hybrid, is a time consuming task that we are >>> paying in full price. Hadoop community should work smarter for the source >>> control. What do people think about fully adopting git instead of svn? >>> >> >> +1 for Git as a tool. But using git makes it even _more_ important that we >> have a clearly defined release process that outlines which branches are >> meant to be released as official artifacts, and what the inclusion criteria >> for those branches should be. >> > > I'm +0.9995 for git: some bits I like, some bits I don't (it's awful for binary data). And you need more than just a release process locked down, you need the developers understanding following a good process. If you have written down process docs there, I'd love to see them. > > apache infrastructure are discussing git -what would be best would be to start with a non-critical project, such as one or more of the moved contrib projects (like MR-Unit), so we can see that it, gerrit, etc work well within the Hadoop developer world. > > >
-
Re: [DISCUSSION] development process of HadoopScott Carey 2011-05-10, 18:06
On 5/6/11 7:16 AM, "Marcos Ortiz" <[EMAIL PROTECTED]> wrote: >> >> >+1 for Git > >We migrated from SVN to Git for our completed infrastructure, for many >reason: >- Git use much less space than SVN, all the changes are in a single .git FWIW, svn 1.7 will have a single DB file too. Though that project has some chaos at the moment too and the release of 1.7 may be soon or a ways away. It is still slow over the network compared to git. It is also adding 'svn patch'. >- Git is awesome for branching >- Another great advantage is that there are many developers that know >Git, and how the development process can be greatly improved. There are also many developers who know svn, and many who don't know git. That is not a clear win. > >PostgreSQL, one of my favorites open source projects that I use on my >daily work, migrated the development process to Git from CVS. Almost anything is better than CVS. I don't feel that the primary cause of hadoop's situation is due to svn. Git would help with merging patches that have become stale for sure, and especially help on the client side for developers who need maintain many concurrent contexts. But there are many significant process issues at the heart of the problem that are not due to the tools. > >Regards. > >-- >Marcos Luís Ortíz Valmaseda > Software Engineer (Large-Scaled Distributed Systems) > University of Information Sciences, > La Habana, Cuba > Linux User # 418229 > http://about.me/marcosortiz > |