|
|
-
[DISCUSS] secure 0.20-based branch
Doug Cutting 2010-04-23, 17:59
Y!'s developed extensive security features based on the 0.20 branch. The 0.20 versions of the individual patches appear in Jira, but these have not been committed to any branch in Apache's SVN. Y! has periodically pushed out versions of these as Yahoo!'s Distribution of Hadoop at github, and Cloudera is likely to make a 0.20-based distribution including these as well.
Shouldn't we commit these all to some 0.20-based branch at Apache? I'd earlier (on common-dev) suggested we might start a 1.0 branch based on 0.20, then add a 1.1 branch with the security patches. If that were done, the 0.21 release could perhaps instead be called 1.2. But, regardless of the naming, it would be good to have the 0.20 versions of all of the security patches committed to a branch at Apache so that we can make a release that includes them, patches can be targeted against this branch, etc.
What do others think?
Doug
-
Re: [DISCUSS] secure 0.20-based branch
Allen Wittenauer 2010-04-23, 18:34
On Apr 23, 2010, at 10:59 AM, Doug Cutting wrote: > What do others think?
That 0.20 is not 1.0 quality, no matter how hard people want to believe it is true. The API may be stable, but the ops support sucks.
-
Re: [DISCUSS] secure 0.20-based branch
Doug Cutting 2010-04-23, 18:58
Allen Wittenauer wrote: > That 0.20 is not 1.0 quality, no matter how hard people want to believe it is true.
Allen, my question was, "regardless of the naming" should we try to merge all of the 0.20-based security patches to a branch in Apache's subversion?
As for the naming, the major release number does not make a claim about quality or features, but rather about compatibility. 1.0 would presumably be the lowest quality and least featured release in the 1.x series, but everything in that series should be API compatible with 1.0. Every release in the 2.x series might not be compatible with 1.0. Point releases add features, dot releases add quality. So 1.0.1 would only improve quality, while 1.1.0 would add features while maintaining compatibility.
Doug
-
[DISCUSS] secure 0.20-based branch
Chris Douglas 2010-04-23, 20:35
Please, let's not repeat the discussion on 0.20 as 1.0 in this thread.
I oppose the immortality of the 0.20 branch for the same reasons I opposed it on common-dev. From a technical perspective, nothing has been more destructive to the momentum and focus of this project than the perpetual backporting and development on this branch. Yahoo, Cloudera, and Facebook have their reasons for building fortresses on the sands of 0.20, but Apache has a year of development beyond that. It's a dark, unmapped jungle at the moment, but what you propose will only exacerbate that problem by establishing a fourth settlement on that sad oasis.
I vote no. Apache doesn't need to participate in the ridiculous exercise of porting 0.20 to 0.22. Why not support (and aid) Tom's effort to stabilize trunk? -C
On Friday, April 23, 2010, Doug Cutting <[EMAIL PROTECTED]> wrote: > Allen Wittenauer wrote: > > That 0.20 is not 1.0 quality, no matter how hard people want to believe it is true. > > > Allen, my question was, "regardless of the naming" should we try to merge all of the 0.20-based security patches to a branch in Apache's subversion? > > As for the naming, the major release number does not make a claim about quality or features, but rather about compatibility. 1.0 would presumably be the lowest quality and least featured release in the 1.x series, but everything in that series should be API compatible with 1.0. Every release in the 2.x series might not be compatible with 1.0. Point releases add features, dot releases add quality. So 1.0.1 would only improve quality, while 1.1.0 would add features while maintaining compatibility. > > Doug >
-
Re: [DISCUSS] secure 0.20-based branch
Doug Cutting 2010-04-23, 20:41
Chris Douglas wrote: > Why not support (and aid) Tom's effort to stabilize trunk?
I do support Tom's efforts and do not see these as mutually exclusive.
Doug
-
Re: [DISCUSS] secure 0.20-based branch
Doug Cutting 2010-04-23, 20:51
Chris Douglas wrote: > From a technical perspective, nothing has > been more destructive to the momentum and focus of this project than > the perpetual backporting and development on this branch.
I'm not proposing any more back- or forward-porting than will be done anyway. I'm proposing we commit all the 0.20 security patches to a repo at Apache. This could be as simple as copying Y!'s github repo wholesale to a branch at Apache. Or, with a bit more effort, we can probably apply the patches from Jira in the right order. Otherwise, Cloudera, Facebook and others will have to duplicate this effort. Then bugfix patches and releases can be made against this repo rather than everyone having to assemble and maintain their own 0.20 security patchset from scratch. Everyone will be using this patchset for quite some time. Why shouldn't we share a repo that contains it?
Doug
-
Re: [DISCUSS] secure 0.20-based branch
Dhruba Borthakur 2010-04-23, 21:30
I support Doug's idea whole-heartedly. The question that remains is "who gets to test and stabilize this new branch"? I am proposing that we designate a owner for this branch and it is the onus of the owner of this branch to test/stabilize that branch. thanks, dhruba On Fri, Apr 23, 2010 at 1:51 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > Chris Douglas wrote: > >> From a technical perspective, nothing has >> been more destructive to the momentum and focus of this project than >> the perpetual backporting and development on this branch. >> > > I'm not proposing any more back- or forward-porting than will be done > anyway. I'm proposing we commit all the 0.20 security patches to a repo at > Apache. This could be as simple as copying Y!'s github repo wholesale to a > branch at Apache. Or, with a bit more effort, we can probably apply the > patches from Jira in the right order. Otherwise, Cloudera, Facebook and > others will have to duplicate this effort. Then bugfix patches and releases > can be made against this repo rather than everyone having to assemble and > maintain their own 0.20 security patchset from scratch. Everyone will be > using this patchset for quite some time. Why shouldn't we share a repo that > contains it? > > Doug > -- Connect to me at http://www.facebook.com/dhruba
-
Re: [DISCUSS] secure 0.20-based branch
Doug Cutting 2010-04-23, 23:02
Dhruba Borthakur wrote: > I support Doug's idea whole-heartedly. The question that remains is "who > gets to test and stabilize this new branch"? As a starting point, Y!'s distribution includes a patch list at: http://github.com/yahoo/hadoop-common/commits/yahoo-hadoop-0.20Cloudera lists its patches in the tarball inside the source rpm: http://archive.cloudera.com/redhat/cdh/3/SRPMS/hadoop-0.20-0.20.2+228-1.src.rpmWe could perhaps start with the intersection of these and vote on that. If there are patches missing from this list that you believe are well tested and critical to Facebook, then these could be nominated as well. We'd require a consensus vote on each patch. > I am proposing that we > designate a owner for this branch and it is the onus of the owner of this > branch to test/stabilize that branch. I'm happy to call votes on patch lists, merge patches to the branch, run unit tests and roll release candidates, although I'd love help. If we stick to patches and combinations of patches that folks are already testing elsewhere, then this should be a stable branch. The first release on this branch should be declared alpha until its tested in a variety of environments. Folks should of course not immediately put into production any release from this branch (or any other branch) without some testing of their own. If folks prefer to continue to use releases blessed by Y! or Cloudera, then we'd at least make the patch lists of those releases considerably shorter. This branch would simplify sharing of bugfixes even if we don't make releases from it, since it would already contain the patches common to most production environments. Doug
-
[DISCUSS] secure 0.20-based branch
Chris Douglas 2010-04-24, 05:38
> I'm not proposing any more back- or forward-porting than will be done anyway.
That's probably true for this release, but what about HDFS-200? With security and sync in 0.20, there is less motivation to move back to trunk, which has diverged significantly. Moving off of 0.20 will be a struggle without these supplements. With a weak trunk, the justifications that led to the current state will remain in force.
> Or, with a bit more effort, we can probably apply the patches from Jira in the right order. Otherwise, Cloudera, Facebook and others will have to duplicate this effort.
Cloudera, Facebook, and others could also help to finish this work in trunk. Or clone from github if the need is dire.
> Then bugfix patches and releases can be made against this repo rather than everyone having to assemble and maintain their own 0.20 security patchset from scratch. Everyone will be using this patchset for quite some time. Why shouldn't we share a repo that contains it?
Because trunk is the shared repository that contains the security work. And a working append. And dozens of smaller, but important features including the 1.0 APIs. Symlinks. Optimizations to the shuffle. Splittable bzip compression. Stability and scalability fixes to the NameNode and JobTracker. Unicorns and happiness.
Stabilizing, packaging, and testing trunk is drudgery, but it can be shared.
I can see the value in restarting collaboration between major contributors by reestablishing a common branch, and 0.20 will probably be more successful in that respect, at least earlier. However, I continue to oppose sinking combined energy into 0.20 at the expense of trunk, for reasons already discussed at length. -C
> Doug >
-
Re: [DISCUSS] secure 0.20-based branch
Steve Loughran 2010-04-26, 15:42
Doug Cutting wrote: > Y!'s developed extensive security features based on the 0.20 branch. The > 0.20 versions of the individual patches appear in Jira, but these have > not been committed to any branch in Apache's SVN. Y! has periodically > pushed out versions of these as Yahoo!'s Distribution of Hadoop at > github, and Cloudera is likely to make a 0.20-based distribution > including these as well. > > Shouldn't we commit these all to some 0.20-based branch at Apache? I'd > earlier (on common-dev) suggested we might start a 1.0 branch based on > 0.20, then add a 1.1 branch with the security patches. If that were > done, the 0.21 release could perhaps instead be called 1.2. But, > regardless of the naming, it would be good to have the 0.20 versions of > all of the security patches committed to a branch at Apache so that we > can make a release that includes them, patches can be targeted against > this branch, etc. > > What do others think?
I'm biased as I don't have that much data in HDFS right now, but I'm mostly in favour of SVN_TRUNK as where things go, not adding new features to the existing stuff
-
Re: [DISCUSS] secure 0.20-based branch
Jakob Homan 2010-04-26, 16:53
I can't see any way in which staying mired in developing on 20 rather than trunk is beneficial to the long-term health of the project. 20.0 was released in late April, 2009 and we've not had a major release since then, which is hurting the project significantly. Packaging and testing a security-enabled release is very difficult, let me assure you, and will require a significant, concerted effort that would take away a huge number of cycles from releasing 21 - whatever form 21 takes - and fixing bugs on trunk. We're in the process of moving all the work we did on Y!'s 20 to trunk so that we can have a great 21 (&& 22).
-jakob
-
Re: [DISCUSS] secure 0.20-based branch
Scott Carey 2010-04-27, 21:15
On Apr 23, 2010, at 10:38 PM, Chris Douglas wrote:
>> I'm not proposing any more back- or forward-porting than will be done anyway. >> > > Because trunk is the shared repository that contains the security > work. And a working append. And dozens of smaller, but important > features including the 1.0 APIs. Symlinks. Optimizations to the > shuffle. Splittable bzip compression. Stability and scalability fixes > to the NameNode and JobTracker. Unicorns and happiness. >
I'm for anything that gets all the goodies above out in a release. I don't care if they all get in one release or if its spread out over 2 or 3. Right now, about 1/4 of the above (e.g. happiness, but no unicorns) is in CDH2/3. Trunk has stalled, getting new -- CORE -- features requires using other branches.
Although I would like to see the changes that these other branches have in apache's SVN, they belong in trunk. 0.20 is old already. Its the old, stable branch now and new stuff should go into newer releases. I've been waiting for things like the Shuffle refactor (30% performance improvement for some of my job flows) for a long time.
Just because Y! is not going to upgrade their deployment past their branch for a long time does not mean the rest of the community has to wait. I lived on 0.19.2 in production until very recently -- it became a solid branch without Y! or Facebook. Without the same testing muscle, it might take 1 or two more minor releases to stabilize, but the community's release schedule IMO desperately needs to become more independent of the biggest players.
Trunk should be moved forward and incorporate Cloudera and Yahoo's improvements aggressively. Its OK to have a 0.x.0 release that isn't completely stable yet, or backed by the biggest users. It is important to incorporate improvements made by productive contributors into actual releases in a timely fashion, or else those contributors will roll their own versions and eventually diverge significantly from the community rather than wait to get value from their work. > Stabilizing, packaging, and testing trunk is drudgery, but it can be shared. > > I can see the value in restarting collaboration between major > contributors by reestablishing a common branch, and 0.20 will probably > be more successful in that respect, at least earlier. However, I > continue to oppose sinking combined energy into 0.20 at the expense of > trunk, for reasons already discussed at length. -C >
I would love to see an apache release with new, useful features and enhancements. That could be a 0.20 with all or most of the Y! and Cloudera stuff in there. However, if any such effort slows down progress on trunk -- forget it. Get a 0.21 or 0.22 out with whatever features are ready, and move the ball forward on trunk. We should not encourage 0.20 to live forever.
0.21 and 0.22 should be releases that are compelling enough for Y!, Cloudera, and anyone else with their own customizations to want to move to for their own sake. >> Doug >>
|
|