|
Mattmann, Chris A
2012-08-29, 02:33
Eric Baldeschwieler
2012-08-29, 03:45
Alejandro Abdelnur
2012-08-29, 03:50
Mattmann, Chris A
2012-08-29, 14:14
Robert Evans
2012-08-29, 15:17
Arun C Murthy
2012-08-29, 16:31
Suresh Srinivas
2012-08-29, 17:02
Arun C Murthy
2012-08-29, 17:04
Alejandro Abdelnur
2012-08-29, 17:13
Mattmann, Chris A
2012-08-29, 17:22
Suresh Srinivas
2012-08-29, 17:26
Michael Segel
2012-08-29, 17:26
Tom White
2012-08-29, 17:30
Eric Baldeschwieler
2012-08-29, 17:42
Arun C Murthy
2012-08-29, 18:22
Jun Ping Du
2012-08-29, 18:35
Konstantin Boudnik
2012-08-29, 18:41
Eli Collins
2012-08-29, 18:41
Arun C Murthy
2012-08-29, 18:48
Eli Collins
2012-08-29, 18:49
Tom White
2012-08-29, 20:34
Alejandro Abdelnur
2012-08-29, 20:40
Todd Lipcon
2012-08-29, 21:18
Jakob Homan
2012-08-29, 21:22
Travis Thompson
2012-08-29, 22:30
Mattmann, Chris A
2012-08-29, 23:19
Mattmann, Chris A
2012-08-29, 23:20
Konstantin Boudnik
2012-08-29, 23:27
Mattmann, Chris A
2012-08-29, 23:29
Mattmann, Chris A
2012-08-29, 23:32
Mattmann, Chris A
2012-08-29, 23:34
Todd Lipcon
2012-08-29, 23:35
Konstantin Boudnik
2012-08-29, 23:40
Todd Lipcon
2012-08-29, 23:44
Konstantin Boudnik
2012-08-29, 23:47
Todd Lipcon
2012-08-29, 23:48
Aaron T. Myers
2012-08-29, 23:53
Mattmann, Chris A
2012-08-29, 23:54
Todd Lipcon
2012-08-30, 00:16
Mattmann, Chris A
2012-08-30, 00:55
Arun C Murthy
2012-08-30, 01:47
Arun C Murthy
2012-08-30, 01:52
Konstantin Boudnik
2012-08-30, 02:59
Eli Collins
2012-08-30, 05:38
Eli Collins
2012-08-30, 05:46
Mattmann, Chris A
2012-08-30, 06:06
Eli Collins
2012-08-30, 06:18
Arun C Murthy
2012-08-30, 06:31
Mattmann, Chris A
2012-08-30, 06:31
Sharad Agarwal
2012-08-30, 06:41
Eli Collins
2012-08-30, 07:02
Alejandro Abdelnur
2012-08-30, 07:11
Eli Collins
2012-08-30, 07:17
Konstantin Shvachko
2012-08-30, 10:12
Arun C Murthy
2012-08-30, 10:25
Arun C Murthy
2012-08-30, 11:00
Arun C Murthy
2012-08-30, 12:29
Andrew Purtell
2012-08-30, 13:46
Mattmann, Chris A
2012-08-30, 13:51
Andrew Purtell
2012-08-30, 14:11
Aaron T. Myers
2012-08-30, 14:23
Brock Noland
2012-08-30, 14:43
Doug Cutting
2012-08-30, 16:17
Inder.dev Java
2012-08-30, 16:33
Doug Cutting
2012-08-30, 17:00
Owen O'Malley
2012-08-30, 18:25
Chris Douglas
2012-08-31, 01:24
Devaraj Das
2012-08-31, 01:28
Vinod Kumar Vavilapalli
2012-08-31, 03:35
Andrew Purtell
2012-08-31, 06:02
Mattmann, Chris A
2012-08-31, 06:15
Mattmann, Chris A
2012-08-31, 06:36
Andrew Purtell
2012-08-31, 06:42
Mattmann, Chris A
2012-08-31, 06:50
Andrew Purtell
2012-08-31, 07:55
Steve Loughran
2012-08-31, 11:54
Robert Evans
2012-08-31, 14:34
Mahadev Konar
2012-08-31, 15:05
Mattmann, Chris A
2012-08-31, 15:09
Roman Shaposhnik
2012-08-31, 15:59
Doug Cutting
2012-08-31, 16:00
Mattmann, Chris A
2012-08-31, 16:08
Eli Collins
2012-08-31, 16:54
Robert Evans
2012-08-31, 16:58
Todd Lipcon
2012-08-31, 16:59
Todd Lipcon
2012-08-31, 17:06
Alejandro Abdelnur
2012-08-31, 17:10
Alejandro Abdelnur
2012-08-31, 17:11
Jagane Sundar
2012-08-31, 17:24
Robert Evans
2012-08-31, 18:15
Inder.dev Java
2012-08-31, 19:00
Doug Cutting
2012-08-31, 20:44
Eric Baldeschwieler
2012-08-31, 22:43
Eric Baldeschwieler
2012-09-01, 00:23
Sharad Agarwal
2012-09-01, 09:59
Andrew Purtell
2012-09-01, 13:21
Andrew Purtell
2012-09-01, 13:32
Arun C Murthy
2012-09-03, 11:02
|
-
[DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-29, 02:33
[decided to minimize traffic and to simply put this in one thread]
Hi Guys, See the recent discussion on these threads: YARN as its own Hadoop "sub project": http://s.apache.org/WW1 Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating as a single project, that's masking separate communities that themselves are really separate ASF projects. At the ASF, this has been a problem area called "umbrella" projects and over the years, all I've seen from them is wasted bandwidth, artificial barriers and the inventions of new ways to perform process mongering and to reduce the fun in developing software at this fantastic foundation. I've talked about umbrella projects enough. We've diverted conversation enough. Enough people have tried to act like there is some technical mumbo jumbo that is preventing the eventual act of higher power that I myself hope comes should these discussions prove unfruitful through normal means. *these. are. separate. projects.* *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* In this email: http://s.apache.org/rSm And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy through below for splitting these projects into their own TLPs: -----snip Process: 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've already discussed. 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus can be reached (just a thought experiment). VOTE if necessary. 3. [VOTE] thread for <TLP name> 4. Create Project: a. paste resolution from #0 to board@ or; b. go to general@incubator and start new Incubator project. 5. infrastructure set up. MLs moving; new UNIX groups; website setup; SVN setup like this: svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool MR name>; or svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool YARN name>; or svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool HDFS name> After all 3 have been created run: svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency issues from there. 7. If 4b; then graduate as TLP from Incubator. -----snip So that's my proposal. Thanks guys. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEric Baldeschwieler 2012-08-29, 03:45
+1
Over the course of this discussion I've become convinced it is time to split up Hadoop. Pig, Hive, Zookeeper, HBase and other Hadoop graduates all seem to have been plagued by fewer meta-discussions and bi-law fights., etc since they graduated from Hadoop. Board members have been advising us to do this for years. With 1.0 stable and 2.0 on the way, now seems like a good time to do it. With mavenization done and the advent of BigTop and multiple 3rd party hadoop distro packagers, there is little doubt that people concerned about consuming the work of the distinct projects will be able to get them to work together. On Aug 28, 2012, at 7:33 PM, Mattmann, Chris A (388J) wrote: > [decided to minimize traffic and to simply put this in one thread] > > Hi Guys, > > See the recent discussion on these threads: > > YARN as its own Hadoop "sub project": http://s.apache.org/WW1 > Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx > > ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating > as a single project, that's masking separate communities that themselves are really > separate ASF projects. > > At the ASF, this has been a problem area called "umbrella" projects and over the years, > all I've seen from them is wasted bandwidth, artificial barriers and the inventions of > new ways to perform process mongering and to reduce the fun in developing software > at this fantastic foundation. > > I've talked about umbrella projects enough. We've diverted conversation enough. > Enough people have tried to act like there is some technical mumbo jumbo that is > preventing the eventual act of higher power that I myself hope comes should these > discussions prove unfruitful through normal means. > > *these. are. separate. projects.* > *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* > > In this email: http://s.apache.org/rSm > > And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy > through below for splitting these projects into their own TLPs: > > -----snip > Process: > > 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. > > 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've > already discussed. > > 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus > can be reached (just a thought experiment). VOTE if necessary. > > 3. [VOTE] thread for <TLP name> > > 4. Create Project: > a. paste resolution from #0 to board@ or; > b. go to general@incubator and start new Incubator project. > > 5. infrastructure set up. > MLs moving; new UNIX groups; website setup; > SVN setup like this: > > svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool MR name>; or > svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool YARN name>; or > svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool HDFS name> > > After all 3 have been created run: > > svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop > > 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency > issues from there. > > 7. If 4b; then graduate as TLP from Incubator. > > -----snip > > So that's my proposal. > > Thanks guys. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAlejandro Abdelnur 2012-08-29, 03:50
Chris, thanks for initiating the discussion.
IMO a pre-requisite to this is to figure out how we'll handle the following: * Where does common stuff lives? * What are the public interfaces of each project (towards the other projects)? * How do we do development/releases? In tandem? Separate? How this will work in practice, currently we are constantly tweaking things inter-projects, sometimes in the same JIRAs, sometimes in follow up JIRAs. Thoughts? Thxs. On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) <[EMAIL PROTECTED]> wrote: > [decided to minimize traffic and to simply put this in one thread] > > Hi Guys, > > See the recent discussion on these threads: > > YARN as its own Hadoop "sub project": http://s.apache.org/WW1 > Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx > > ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating > as a single project, that's masking separate communities that themselves are really > separate ASF projects. > > At the ASF, this has been a problem area called "umbrella" projects and over the years, > all I've seen from them is wasted bandwidth, artificial barriers and the inventions of > new ways to perform process mongering and to reduce the fun in developing software > at this fantastic foundation. > > I've talked about umbrella projects enough. We've diverted conversation enough. > Enough people have tried to act like there is some technical mumbo jumbo that is > preventing the eventual act of higher power that I myself hope comes should these > discussions prove unfruitful through normal means. > > *these. are. separate. projects.* > *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* > > In this email: http://s.apache.org/rSm > > And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy > through below for splitting these projects into their own TLPs: > > -----snip > Process: > > 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. > > 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've > already discussed. > > 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus > can be reached (just a thought experiment). VOTE if necessary. > > 3. [VOTE] thread for <TLP name> > > 4. Create Project: > a. paste resolution from #0 to board@ or; > b. go to general@incubator and start new Incubator project. > > 5. infrastructure set up. > MLs moving; new UNIX groups; website setup; > SVN setup like this: > > svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool MR name>; or > svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool YARN name>; or > svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool HDFS name> > > After all 3 have been created run: > > svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop > > 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency > issues from there. > > 7. If 4b; then graduate as TLP from Incubator. > > -----snip > > So that's my proposal. > > Thanks guys. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA Alejandro
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-29, 14:14
Hi Alejandro,
On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote: > Chris, thanks for initiating the discussion. No probs! > > IMO a pre-requisite to this is to figure out how we'll handle the following: > To be honest, I don't think any of the below are prereqs. They are technical issues that can be dealt with post facto of just SVN copy'ing hadoop as it stands today per my SVN commands into each of the new TLPs and then using that as a starting point for doing the below, as part of the natural evolution of the project code. That being said, if I had to guess what the TLPs would do to address the below once they are created: > * Where does common stuff lives? This usually happens over time and depending on how often things release, and other things cited else-threads, and else-discussions over the past years in Hadoop. You guys clearly have a good handle on things like this. I would just encourage the subsequent TLPs to not worry about doing everything perfectly and to realize that if you start out with the same code base, you can selectively and then iteratively just make things more clean, refactored, and the answer to questions like this will happen naturally during that evolution. > * What are the public interfaces of each project (towards the other projects)? This is something that each distinct community can answer once they are bootstrapped as TLPs. You can decide what portion of the code is really under charter and then work as a community to figure this out. Sorry I can't be more specific than that. > * How do we do development/releases? In tandem? Separate? In tandem across communities never really works. Releases should occur separately, per community and TLP, on their own schedule. Code that depends on other projects either has to wait for those communities/TLPs/projects to fix things, or add new features, or whatever, or insulate, and keep the fixes locally in your project's SVN until those fixes can be pushed upstream, and included in the other communities releases, etc. Ask yourself this. If you guys have a dependency on e.g., Tomcat, and there is some critical bug or new feature you want in Tomcat, how would you deal with that? I would posit the same way that you could deal with this situation. Keep the fix to Tomcat locally in your project; work to get that fix upstream and included in some subsequent Tomcat release, etc. > How this > will work in practice, currently we are constantly tweaking things > inter-projects, sometimes in the same JIRAs, sometimes in follow up > JIRAs. Technically you are doing that that, but community wise, it's not working out, and hasn't really been working for years. I've been around Hadoop since its inception (I was a Nutch committer before Hadoop existed), and though it's been hugely successful, and really awesome and super great (congrats, everyone, BTW!), the community issues have always cropped up b/c it's one big huge umbrella project and that doesn't work at Apache. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectRobert Evans 2012-08-29, 15:17
I personally am for splitting up the projects. I think there is a lot of
potential that each of the projects could have on their own, and I expect to see them evolve in new and interesting ways when the projects are not tied directly together. But, in order to get there we need to address the issues that made the first split attempt fail. First off we need look at all API calls that MR, YARN, or HDFS do into common that are not @Stable, and either promote them to @Stable or remove the need for those calls. Second while we are doing that we need to look at the visibility of those APIs. How many APIs really need to be @LimitedPrivate or should they be @Public? How many of the APIs have no designation at all? Third get truly serious about maintaining binary compatibility on @Stable APIs. Fourth we need to start splitting the projects up, starting with common. I think it would be cool to call it liBig, but I digress. Once common has been split out and is on its own for a few releases, we start splitting out HDFS, YARN, and MapReduce. For each of those we need to do a similar audit between the projects and fix the interdependencies between them. This is mostly dependencies between YARN and MR. As part of this we also need to have a clear set of rules about what it takes to become a committer or PMC member for the new projects when they split off. I am fine with all committers become PMC members, but if we merge the lists now and simply say all pervious committers become committers on the new TLPs there will be a lot of committers/PMC members that have no real desire to be on those projects. I would propose that we merge the committer lists, but all committers on the current project receive an invitation to become a committer on the new projects. ATM convinced me that committers know their boundaries and will self censor. I believe that many committers will decline to become committers on the new projects either because it is out of their area of experteese or because they are not involved with Hadoop any more, and will ignore the invitation. I fear that just voting and doing an svn copy -m will result in the same thing that happened last time. Someone will want to make a large change. This will require making a change to something in common, but because it cannot easily be done in a backwards compatible way, or it will take three steps to complete the change instead of one we will get frustrated. If this happens enough we will really get frustrated and try to merge the projects back together again. This is because the projects are too tightly coupled together right now to really have them stand on their own. Just look at all of the security and token work that has been done recently. They have touched every single project and it has been a bit of a nightmare. It would be even worse if the projects were completely split apart. I also want us to think about the timing of this. Do we really want to do this before 2.0 is GA? Doing this properly is probably going to be a several month effort for one or two people, and a concerted effort by everyone not to break things while they work. If we have to rearchitect something so that the APIs can be marked stable it may be a lot longer then that. Is it worth pushing the GA of 2.0 off by an entire quarter? For me I would say yes, but I know others have different opinions, and different schedules. @Chris, I can see your desire to do the split now, and then deal with the fallout as we adapt to the changes. I think that would work assuming that we all are completely committed to making the changes necessary. But because we are having this discussion at all seems to indicate that we are not all completely committed to this, and I also feel that dealing with the fallout is going to take a lot longer if we don't try to address some of the problems up front. Putting on my Yahoo! Hat, I want to avoid as many problems and delays as I can, because my customers want a stable release of Hadoop the features that are in 2.0. The longer it is delayed the longer we stay on branch-0.23. A one quarter delay because of this I am sure I can swing, more then that and I will start to get more pressure to pull in new features which will probably mean that we then have to fork which is something that I really do not want to do. So I am +1 on merging the committer list, and +1 splitting the projects. I would encourage us to at least do some planning and legwork up front before splitting. I am even +1 for setting a deadline on which date svn -m will happen wether we are ready or not. On 8/28/12 10:50 PM, "Alejandro Abdelnur" <[EMAIL PROTECTED]> wrote:
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectArun C Murthy 2012-08-29, 16:31
On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote: > Chris, thanks for initiating the discussion. Likewise, thanks Chris! > > IMO a pre-requisite to this is to figure out how we'll handle the following: > Good points - I'd recommend we keep Common and HDFS in the same project. Yes, MR/YARN will need some changes in Common occasionally, but core pieces like RPC have been maintained by HDFS folks over time anyway e.g. move to ProtoBufs were led by Sanjay, Suresh, Todd, Jitendra et al. We can move SequenceFile into MR if necessary and keep same package names for compatibility. We should, of course, stop tweaking things in different projects in the same jira - we've been reasonably good at not doing that. Thoughts? Arun > * Where does common stuff lives? > * What are the public interfaces of each project (towards the other projects)? > * How do we do development/releases? In tandem? Separate? How this > will work in practice, currently we are constantly tweaking things > inter-projects, sometimes in the same JIRAs, sometimes in follow up > JIRAs. > > Thoughts? > > Thxs. > > On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: >> [decided to minimize traffic and to simply put this in one thread] >> >> Hi Guys, >> >> See the recent discussion on these threads: >> >> YARN as its own Hadoop "sub project": http://s.apache.org/WW1 >> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx >> >> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating >> as a single project, that's masking separate communities that themselves are really >> separate ASF projects. >> >> At the ASF, this has been a problem area called "umbrella" projects and over the years, >> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of >> new ways to perform process mongering and to reduce the fun in developing software >> at this fantastic foundation. >> >> I've talked about umbrella projects enough. We've diverted conversation enough. >> Enough people have tried to act like there is some technical mumbo jumbo that is >> preventing the eventual act of higher power that I myself hope comes should these >> discussions prove unfruitful through normal means. >> >> *these. are. separate. projects.* >> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* >> >> In this email: http://s.apache.org/rSm >> >> And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy >> through below for splitting these projects into their own TLPs: >> >> -----snip >> Process: >> >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. >> >> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've >> already discussed. >> >> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus >> can be reached (just a thought experiment). VOTE if necessary. >> >> 3. [VOTE] thread for <TLP name> >> >> 4. Create Project: >> a. paste resolution from #0 to board@ or; >> b. go to general@incubator and start new Incubator project. >> >> 5. infrastructure set up. >> MLs moving; new UNIX groups; website setup; >> SVN setup like this: >> >> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool MR name>; or >> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool YARN name>; or >> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool HDFS name> >> >> After all 3 have been created run: >> >> svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop >> >> 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectSuresh Srinivas 2012-08-29, 17:02
I am +1 for splitting up the projects. This is the step in the right
direction. There will be challenges along the way. I am confident we can solve them. Robert and Alejandro have brought up good questions. Here are my thoughts: - For first one or two releases all the projects can coordinate and do the releases together. This should help simplify the immediate work needed. This should also help in us meeting the release timelines that we are working towards. As the split makes progress, this cross project coordination will no longer be necessary. I volunteer to RM these releases and do the needed co-ordination from HDFS. - As regards to APIs, currently we have LimitedPrivate APIs for related projects. This has been used by HBase as well. We need to think about a timeline by when we can mark these APIs stable. They should remain LimitedPrivate. Any rare changes to APIs requires only co-ordination among the projects and no user applications (which we have not control over) is affected. - I agree with Arun that the common can move with HDFS. Regards, Suresh On Wed, Aug 29, 2012 at 9:31 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > > On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote: > > > Chris, thanks for initiating the discussion. > > Likewise, thanks Chris! > > > > > IMO a pre-requisite to this is to figure out how we'll handle the > following: > > > > > Good points - I'd recommend we keep Common and HDFS in the same project. > Yes, MR/YARN will need some changes in Common occasionally, but core pieces > like RPC have been maintained by HDFS folks over time anyway e.g. move to > ProtoBufs were led by Sanjay, Suresh, Todd, Jitendra et al. > > We can move SequenceFile into MR if necessary and keep same package names > for compatibility. > > We should, of course, stop tweaking things in different projects in the > same jira - we've been reasonably good at not doing that. > > Thoughts? > > Arun > > > * Where does common stuff lives? > > * What are the public interfaces of each project (towards the other > projects)? > > * How do we do development/releases? In tandem? Separate? How this > > will work in practice, currently we are constantly tweaking things > > inter-projects, sometimes in the same JIRAs, sometimes in follow up > > JIRAs. > > > > Thoughts? > > > > Thxs. > > > > On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) > > <[EMAIL PROTECTED]> wrote: > >> [decided to minimize traffic and to simply put this in one thread] > >> > >> Hi Guys, > >> > >> See the recent discussion on these threads: > >> > >> YARN as its own Hadoop "sub project": http://s.apache.org/WW1 > >> Maintain a single committer list for the Hadoop project: > http://s.apache.org/Owx > >> > >> ...and just pay attention to the Hadoop project over the last 3-4 > years. It's operating > >> as a single project, that's masking separate communities that > themselves are really > >> separate ASF projects. > >> > >> At the ASF, this has been a problem area called "umbrella" projects and > over the years, > >> all I've seen from them is wasted bandwidth, artificial barriers and > the inventions of > >> new ways to perform process mongering and to reduce the fun in > developing software > >> at this fantastic foundation. > >> > >> I've talked about umbrella projects enough. We've diverted conversation > enough. > >> Enough people have tried to act like there is some technical mumbo > jumbo that is > >> preventing the eventual act of higher power that I myself hope comes > should these > >> discussions prove unfruitful through normal means. > >> > >> *these. are. separate. projects.* > >> > *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* > >> > >> In this email: http://s.apache.org/rSm > >> > >> And in the 2 subsequent follow ons in that thread, I've outlined a > process that I'll copy > >> through below for splitting these projects into their own TLPs: > >> > >> -----snip > >> Process: > >> > >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 http://hortonworks.com/download/
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectArun C Murthy 2012-08-29, 17:04
On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote: > I am +1 for splitting up the projects. This is the step in the right > direction. There will be challenges along the way. I am confident we can > solve them. > > Robert and Alejandro have brought up good questions. Here are my thoughts: > - For first one or two releases all the projects can coordinate and do the > releases together. This should help simplify the immediate work needed. > This should also help in us meeting the release timelines that we are > working towards. As the split makes progress, this cross project > coordination will no longer be necessary. I volunteer to RM these releases > and do the needed co-ordination from HDFS. +1 seems like a reasonable first step. Thanks for volunteering Suresh. > - As regards to APIs, currently we have LimitedPrivate APIs for related > projects. This has been used by HBase as well. We need to think about a > timeline by when we can mark these APIs stable. They should remain > LimitedPrivate. Any rare changes to APIs requires only co-ordination among > the projects and no user applications (which we have not control over) is > affected. Agreed. Arun
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAlejandro Abdelnur 2012-08-29, 17:13
On Wed, Aug 29, 2012 at 10:02 AM, Suresh Srinivas
<[EMAIL PROTECTED]> wrote: > - As regards to APIs, currently we have LimitedPrivate APIs for related > projects. This has been used by HBase as well. We need to think about a > timeline by when we can mark these APIs stable. They should remain > LimitedPrivate. Any rare changes to APIs requires only co-ordination among > the projects and no user applications (which we have not control over) is > affected. > - I agree with Arun that the common can move with HDFS. So, this would mean that a bunch of common functionality needed by other TPLs (YARN, MR, HBASE) which is not required by HDFS will end up in HDFS. I'm not necessary against that but it should be well understood/expected/accepted by HDFS TPL, right? Thx Alejandro
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-29, 17:22
Hi Bobby,
On Aug 29, 2012, at 8:17 AM, Robert Evans wrote: > I personally am for splitting up the projects. I think there is a lot of > potential that each of the projects could have on their own, and I expect > to see them evolve in new and interesting ways when the projects are not > tied directly together. > > But, in order to get there we need to address the issues that made the > first split attempt fail. [..snip..] Sorry I snipped the above but mainly I just don't buy the argument that there is a bunch of technical things that *block* splitting the projects. Today, right now, I could propose a new Incubator project, and call it BoooDoopADoop. I could add 5-7 (or 4) people that I think I would work well with. I could invite others to join in the Incubator as part of the initial PPMC list and committer list. We could write in our proposal that the existing Hadoop community is technically amazing, but over time has been mired by a bunch of community issues and we'd like to take our crack at the source code in a brand new Apache project called BooDoopADoop. Then for the code portion of the Incubator proposal, I could say, I will svn copy all of Hadoop into BooDoopADoop and then start from there. So, given that I could do that (as could others), I would also have to readily be prepared for the community bad-will and general ASF bad-will that may cause. It may not cause ASF bad-will, b/c in general the foundation doesn't care about competing projects or technologies. It does care about splintering communities and the like though. Moreover, beyond the Foundation concerns, I would also have to concern myself with pissing you guys off, and all the downstream organizations and companies and individuals that are part of the Hadoop ecosystem that may be pissed off about the way we injected code into BooDoopADoop. But again, nothing stopping me from doing that. I'd like to point out in the above scenario, I don't have to worry about releasing schedules, and this, or that, and the other. Or APIs, or whatever. I have BooDoopADoop, and so does the new community around it in the Incubator, and we simply "go". Then, if others upstream, or downstream find BooDoopADoop useful, they take it, and then incorporate it into their project. Perhaps Hadoop HDFS finds our improvements to BooDoopADoop and its distributed file system better and perhaps we did some Maven magic and made our jar file better or more attractive to use and it saved Hadoop HDFS coding, and time and whatever. So Hadoop HDFS integrates it. See how this could work? So, take me out of BooDoopADooop and replace that with the Hadoop PMC, and the specific subsets of you guys that are actually really distinct PMC members of distinct communities living within the Hadoop ecosystem. Sure you want to technically work together on releases, and APIs, and whatever, but those are, *inter-community* issues, more so than *intra-community* across the Foundation. Sure, it's good to try and coordinate, b/c you guys all have $dayjobs, and the software you build at those $dayjobs is contributed upstream into the ASF, and then others depend on it (and then others downstream of the ASF and even downstream of your companies, depend on it, and so on and so forth). However, as far as the foundation is concerned, communities, and projects (1:1 ideally) coordinate releases on an inter-community-level, not intra-*. the intra-* is usually just icing and way more difficult. > > As part of this we also need to have a clear set of rules about what it > takes to become a committer or PMC member for the new projects when they > split off. I am fine with all committers become PMC members, +1 me too, and your suggestion below about "if we merge..." is one option to doing so. But there could be others and discussing them and putting them up on a list is probably a good idea. I would honestly suggest someone(s) taking a stab at the lists of the new PMC members for the new TLPs and then putting something out there, and then -'ing people or adding them, as needed. And yes, I fully agree, that the PMC lists should not simply be the full Hadoop PMC per new TLP -- then we've just replicated the inherent problem 3x over instead of 1x over :) However, I don't know the ins and outs enough of who those lists should be for HDFS, MR and YARN. I bet you guys do though, so someone, step up and throw something out there for others to shoot down....errr I mean improve! :) [...snip...] See my BooDoopADoop. I don't think that someone in new TLP X wanting to make a change in their copy of common will matter to TLP Y. It shouldn't. It *can*, over time, if there is coordination between X and Y, but it doesn't have to. Get what I mean? This is *not* a technical issue :) This is a community issue. It's independent of the technical issues. This is about how to fix the community issues. But yes, if you guys want to release some upcoming version first or whatever fine, and dandy if the community agrees, but it shouldn't be a gate to fixing community issues. This happens in the Incubator all the time. The big question with a project releasing and then having a graduation VOTE near that release (before or after) -- do we wait to graduate? I'm always a fan of just moving forward on graduation b/c it's independent of the technical stuff. Dealing with Hadoop technical problems is probably not my forte anymore (if it ever was : ) ). I'm here as a Foundation member trying to help with the community problems. In the end, forking is what you guys should do :) You should just do it at Apache. "Fork" the current Hadoop uber project into the actual communities that actually exist. You can fork directly out as TLPs, or incubate the forks. But doing it here would be great :) Thanks for your thoughts Bobby. Hope that explains where I am coming from. Thanks! Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D.
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectSuresh Srinivas 2012-08-29, 17:26
>
> > > - I agree with Arun that the common can move with HDFS. > > So, this would mean that a bunch of common functionality needed by > other TPLs (YARN, MR, HBASE) which is not required by HDFS will end up > in HDFS. I'm not necessary against that but it should be well > understood/expected/accepted by HDFS TPL, right? > RPC is the main common functionality (not used by HBase). Others are some utilities related to native i/o, Configuration and other helper utils. Other than RPC projects we can move utils specific to a project into that project. In some cases if there is code duplication, that is fine. We can make a call on those on case by case basis. -- http://hortonworks.com/download/
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMichael Segel 2012-08-29, 17:26
+1
On Aug 28, 2012, at 10:45 PM, Eric Baldeschwieler <[EMAIL PROTECTED]> wrote: > +1 > > Over the course of this discussion I've become convinced it is time to split up Hadoop. Pig, Hive, Zookeeper, HBase and other Hadoop graduates all seem to have been plagued by fewer meta-discussions and bi-law fights., etc since they graduated from Hadoop. Board members have been advising us to do this for years. With 1.0 stable and 2.0 on the way, now seems like a good time to do it. > > With mavenization done and the advent of BigTop and multiple 3rd party hadoop distro packagers, there is little doubt that people concerned about consuming the work of the distinct projects will be able to get them to work together. > > > > On Aug 28, 2012, at 7:33 PM, Mattmann, Chris A (388J) wrote: > >> [decided to minimize traffic and to simply put this in one thread] >> >> Hi Guys, >> >> See the recent discussion on these threads: >> >> YARN as its own Hadoop "sub project": http://s.apache.org/WW1 >> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx >> >> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating >> as a single project, that's masking separate communities that themselves are really >> separate ASF projects. >> >> At the ASF, this has been a problem area called "umbrella" projects and over the years, >> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of >> new ways to perform process mongering and to reduce the fun in developing software >> at this fantastic foundation. >> >> I've talked about umbrella projects enough. We've diverted conversation enough. >> Enough people have tried to act like there is some technical mumbo jumbo that is >> preventing the eventual act of higher power that I myself hope comes should these >> discussions prove unfruitful through normal means. >> >> *these. are. separate. projects.* >> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* >> >> In this email: http://s.apache.org/rSm >> >> And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy >> through below for splitting these projects into their own TLPs: >> >> -----snip >> Process: >> >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. >> >> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've >> already discussed. >> >> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus >> can be reached (just a thought experiment). VOTE if necessary. >> >> 3. [VOTE] thread for <TLP name> >> >> 4. Create Project: >> a. paste resolution from #0 to board@ or; >> b. go to general@incubator and start new Incubator project. >> >> 5. infrastructure set up. >> MLs moving; new UNIX groups; website setup; >> SVN setup like this: >> >> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool MR name>; or >> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool YARN name>; or >> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool HDFS name> >> >> After all 3 have been created run: >> >> svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop >> >> 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency >> issues from there. >> >> 7. If 4b; then graduate as TLP from Incubator. >> >> -----snip >> >> So that's my proposal. >> >> Thanks guys. >> >> Cheers, >> Chris >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectTom White 2012-08-29, 17:30
On Wed, Aug 29, 2012 at 5:31 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> > On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote: > >> Chris, thanks for initiating the discussion. > > Likewise, thanks Chris! > >> >> IMO a pre-requisite to this is to figure out how we'll handle the following: >> > > > Good points - I'd recommend we keep Common and HDFS in the same project. That seems reasonable. The alternative would be to have a Common TLP, which we shouldn't necessarily dismiss, since more important than the size of the codebase is that there's a community to support the codebase, as there certainly is here. Having said that, a Common TLP lacks a clear 'mission' since it doesn't offer any standalone services. Also, it may diminish in utility over time if pieces are moved into HDFS, MapReduce and YARN. > Yes, MR/YARN will need some changes in Common occasionally, but core pieces like RPC have been maintained by HDFS folks over time anyway e.g. move to ProtoBufs were led by Sanjay, Suresh, Todd, Jitendra et al. Does the work to use versioned protocol buffers for RPC mean that different releases of HDFS and MapReduce can work together yet? If not, this is something we should be working towards (although that shouldn't block a move to TLPs). > > We can move SequenceFile into MR if necessary and keep same package names for compatibility. There are also Hadoop tools like distcp, Hadoop archives, Streaming, etc, which should go with MapReduce. Cheers, Tom > > We should, of course, stop tweaking things in different projects in the same jira - we've been reasonably good at not doing that. > > Thoughts? > > Arun > >> * Where does common stuff lives? >> * What are the public interfaces of each project (towards the other projects)? >> * How do we do development/releases? In tandem? Separate? How this >> will work in practice, currently we are constantly tweaking things >> inter-projects, sometimes in the same JIRAs, sometimes in follow up >> JIRAs. >> >> Thoughts? >> >> Thxs. >> >> On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) >> <[EMAIL PROTECTED]> wrote: >>> [decided to minimize traffic and to simply put this in one thread] >>> >>> Hi Guys, >>> >>> See the recent discussion on these threads: >>> >>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1 >>> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx >>> >>> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating >>> as a single project, that's masking separate communities that themselves are really >>> separate ASF projects. >>> >>> At the ASF, this has been a problem area called "umbrella" projects and over the years, >>> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of >>> new ways to perform process mongering and to reduce the fun in developing software >>> at this fantastic foundation. >>> >>> I've talked about umbrella projects enough. We've diverted conversation enough. >>> Enough people have tried to act like there is some technical mumbo jumbo that is >>> preventing the eventual act of higher power that I myself hope comes should these >>> discussions prove unfruitful through normal means. >>> >>> *these. are. separate. projects.* >>> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* >>> >>> In this email: http://s.apache.org/rSm >>> >>> And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy >>> through below for splitting these projects into their own TLPs: >>> >>> -----snip >>> Process: >>> >>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. >>> >>> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've >>> already discussed. >>> >>> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus >>> can be reached (just a thought experiment). VOTE if necessary.
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEric Baldeschwieler 2012-08-29, 17:42
Hi Tom,
> There are also Hadoop tools like distcp, Hadoop archives, Streaming, > etc, which should go with MapReduce. Good point. I agree. > The alternative would be to have a Common TLP, > which we shouldn't necessarily dismiss, since more important than the > size of the codebase is that there's a community to support the > codebase, as there certainly is here. I guess the question is who would want to be on that project? I don't think the current bundle of stuff in common would form a good kernel for a community. A lack of a coherent community for common has always been a problem with the project split IMO. I could see folks deciding that they were going to build a community around a really good RPC stack, or some other chunk of common, but frankly I think it it premature to do that. Proposals welcome of course, but I think the HDFS folks will want a copy of the RPC stuff in their project and most of the rest of the stuff in common is too small to merit a project and is more easily handled via duplication and then sorting it out / dead code elimination. On Aug 29, 2012, at 10:30 AM, Tom White wrote: > On Wed, Aug 29, 2012 at 5:31 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> >> On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote: >> >>> Chris, thanks for initiating the discussion. >> >> Likewise, thanks Chris! >> >>> >>> IMO a pre-requisite to this is to figure out how we'll handle the following: >>> >> >> >> Good points - I'd recommend we keep Common and HDFS in the same project. > > That seems reasonable. The alternative would be to have a Common TLP, > which we shouldn't necessarily dismiss, since more important than the > size of the codebase is that there's a community to support the > codebase, as there certainly is here. Having said that, a Common TLP > lacks a clear 'mission' since it doesn't offer any standalone > services. Also, it may diminish in utility over time if pieces are > moved into HDFS, MapReduce and YARN. > >> Yes, MR/YARN will need some changes in Common occasionally, but core pieces like RPC have been maintained by HDFS folks over time anyway e.g. move to ProtoBufs were led by Sanjay, Suresh, Todd, Jitendra et al. > > Does the work to use versioned protocol buffers for RPC mean that > different releases of HDFS and MapReduce can work together yet? If > not, this is something we should be working towards (although that > shouldn't block a move to TLPs). > >> >> We can move SequenceFile into MR if necessary and keep same package names for compatibility. > > There are also Hadoop tools like distcp, Hadoop archives, Streaming, > etc, which should go with MapReduce. > > Cheers, > Tom > >> >> We should, of course, stop tweaking things in different projects in the same jira - we've been reasonably good at not doing that. >> >> Thoughts? >> >> Arun >> >>> * Where does common stuff lives? >>> * What are the public interfaces of each project (towards the other projects)? >>> * How do we do development/releases? In tandem? Separate? How this >>> will work in practice, currently we are constantly tweaking things >>> inter-projects, sometimes in the same JIRAs, sometimes in follow up >>> JIRAs. >>> >>> Thoughts? >>> >>> Thxs. >>> >>> On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) >>> <[EMAIL PROTECTED]> wrote: >>>> [decided to minimize traffic and to simply put this in one thread] >>>> >>>> Hi Guys, >>>> >>>> See the recent discussion on these threads: >>>> >>>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1 >>>> Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx >>>> >>>> ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating >>>> as a single project, that's masking separate communities that themselves are really >>>> separate ASF projects. >>>> >>>> At the ASF, this has been a problem area called "umbrella" projects and over the years, >>>> all I've seen from them is wasted bandwidth, artificial barriers and the inventions of
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectArun C Murthy 2012-08-29, 18:22
On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote:
> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote: >> >> Robert and Alejandro have brought up good questions. Here are my thoughts: >> - For first one or two releases all the projects can coordinate and do the >> releases together. This should help simplify the immediate work needed. >> This should also help in us meeting the release timelines that we are >> working towards. As the split makes progress, this cross project >> coordination will no longer be necessary. I volunteer to RM these releases >> and do the needed co-ordination from HDFS. > > > +1 seems like a reasonable first step. Thanks for volunteering Suresh. Also, I'd say we make at least 3-4 alpha/beta releases in this shape. I volunteer to RM for MR/YARN releases and work with Suresh. Arun
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectJun Ping Du 2012-08-29, 18:35
Hi Chris and all,
Thanks for initiating the discussion. Can I say something in a prospective of contributor but not a committer or PMC member? First, I have a feeling that current hadoop project process is good for contributors to deliver a bug fix but not so easy to deliver a big feature. I have great experience in bug fixing work that can get quickly response from committers and checked in. However, I feel a little frustrated in delivering a feature (~5K LOC, very important for hadoop running well on virtualization infrastructure) across common, hdfs, map reduce and yarn. Firstly, you have to figure out different committers you should turn for help on each component, then convince them your ideas and work with them in reviewing and committing the code. Each committers should understand the completed story and learn the code pending on review as well as that already checked in. If some committers are super busy, then the feature looks like pending forever. Thus, due to my current experience, I may have to say this process is not so friendly to contributors who come from different organizations with different backgrounds but have the same wish to contribute more to Apache hadoop. Based on this, for spinning out hadoop sub-project to TLPs, I would glad to see we will have concisely committer list for each projects then committers can be more focus (more bandwidth may be?) and contributors can know who they should turn to get quick response and help there. On the other hand, I would concern it may take more complexity to dependencies for features that across sub-project today as you should figure out branches for each TLP but it is hard to estimate when code can come alive in each branch of TLP (may take the similar complexity to committers as well). I don't have many good suggestions but would be glad to see the process can be more smoothly for contributor's work no matter what decision we are making today. Just 2 cents. Thanks, Junping ----- Original Message ----- From: "Chris A Mattmann (388J)" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Tuesday, August 28, 2012 7:33:58 PM Subject: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project [decided to minimize traffic and to simply put this in one thread] Hi Guys, See the recent discussion on these threads: YARN as its own Hadoop "sub project": http://s.apache.org/WW1 Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating as a single project, that's masking separate communities that themselves are really separate ASF projects. At the ASF, this has been a problem area called "umbrella" projects and over the years, all I've seen from them is wasted bandwidth, artificial barriers and the inventions of new ways to perform process mongering and to reduce the fun in developing software at this fantastic foundation. I've talked about umbrella projects enough. We've diverted conversation enough. Enough people have tried to act like there is some technical mumbo jumbo that is preventing the eventual act of higher power that I myself hope comes should these discussions prove unfruitful through normal means. *these. are. separate. projects.* *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* In this email: http://s.apache.org/rSm And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy through below for splitting these projects into their own TLPs: -----snip Process: 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've already discussed. 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus can be reached (just a thought experiment). VOTE if necessary. 3. [VOTE] thread for <TLP name> 4. Create Project: a. paste resolution from #0 to board@ or; b. go to general@incubator and start new Incubator project. 5. infrastructure set up. MLs moving; new UNIX groups; website setup; SVN setup like this: svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool MR name>; or svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool YARN name>; or svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool HDFS name> After all 3 have been created run: svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency issues from there. 7. If 4b; then graduate as TLP from Incubator. So that's my proposal. Thanks guys. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectKonstantin Boudnik 2012-08-29, 18:41
Another way around is to produce more than one common's artifacts that
will provide some logic split for the downstream projects like MR, and so on. Cos On Wed, Aug 29, 2012 at 10:26AM, Suresh Srinivas wrote: > > > - I agree with Arun that the common can move with HDFS. > > > > So, this would mean that a bunch of common functionality needed by > > other TPLs (YARN, MR, HBASE) which is not required by HDFS will end up > > in HDFS. I'm not necessary against that but it should be well > > understood/expected/accepted by HDFS TPL, right? > > > > RPC is the main common functionality (not used by HBase). Others are some > utilities related to native i/o, Configuration and other helper utils. > Other than RPC projects we can move utils specific to a project into that > project. In some cases if there is code duplication, that is fine. We can > make a call on those on case by case basis. > > -- > http://hortonworks.com/download/
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEli Collins 2012-08-29, 18:41
Thanks for writing up a proposal Chris.
I think it makes sense to have Common live in HDFS at least for now, since it's at the bottom of the stack / dependency chain and it's code is the most intertwined with common, and, per Arun, we tend to work on common stuff more than MR people. The HDFS project is really a lot more than HDFS, eg has all the hadoop commands, non-HDFS file system source, etc but that seems like an OK starting point. We need to figure out the committers and PMC though since the goal is to just have the HDFS community (vs the current Hadoop people) but the project will contain non-HDFS stuff. I'd like to hear from the current Hadoop committers and PMC members that mostly work on MR and YARN - are you guys OK losing your current privileges on the HDFS repo? Otherwise we haven't made much progress (ie HDFS still has multiple communities). We also need to address the areas where it's not so cut and dry, eg where there is a single Hadoop project: - The Hadoop trademark, assume this lives in the HDFS project if Common does? - The user community, eg the users lists that we *just* merged, shall we still keep one list? - We should move the global stuff like "how to get started" docs to Bigtop, which can point to individual projects resources - Hadoop 1.x is is maintenance mode, though it still actively gets patches so we need to consider it. The surgery necessary to split v1 Hadoop is probably not suitable for a sustaining release and not worth it at this point in the lifetime of this branch. I assume the HDFS project will then host the Hadoop 1.x branches? This implies only members of the HDFS project can commit and release. Thanks, Eli On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) <[EMAIL PROTECTED]> wrote: > [decided to minimize traffic and to simply put this in one thread] > > Hi Guys, > > See the recent discussion on these threads: > > YARN as its own Hadoop "sub project": http://s.apache.org/WW1 > Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx > > ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating > as a single project, that's masking separate communities that themselves are really > separate ASF projects. > > At the ASF, this has been a problem area called "umbrella" projects and over the years, > all I've seen from them is wasted bandwidth, artificial barriers and the inventions of > new ways to perform process mongering and to reduce the fun in developing software > at this fantastic foundation. > > I've talked about umbrella projects enough. We've diverted conversation enough. > Enough people have tried to act like there is some technical mumbo jumbo that is > preventing the eventual act of higher power that I myself hope comes should these > discussions prove unfruitful through normal means. > > *these. are. separate. projects.* > *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* > > In this email: http://s.apache.org/rSm > > And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy > through below for splitting these projects into their own TLPs: > > -----snip > Process: > > 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. > > 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've > already discussed. > > 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus > can be reached (just a thought experiment). VOTE if necessary. > > 3. [VOTE] thread for <TLP name> > > 4. Create Project: > a. paste resolution from #0 to board@ or; > b. go to general@incubator and start new Incubator project. > > 5. infrastructure set up. > MLs moving; new UNIX groups; website setup; > SVN setup like this: > > svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool MR name>; or
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectArun C Murthy 2012-08-29, 18:48
On Aug 28, 2012, at 7:33 PM, Mattmann, Chris A (388J) wrote: > Process: > > 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. > How about something like this... please provide your feedback. This is a very early draft, I'll post this on our wiki after discussion. ---- Proposal: Apache Hadoop HDFS as a TLP I propose we graduate HDFS as a TLP named 'Apache Hadoop HDFS'. I think the simplest way is to have all existing HDFS committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has: hadoop-hdfs = acmurthy,atm,aw,boryas,cdouglas,cos,cutting,daryn,ddas,dhruba,eli,enis,eric14,eyang,gkesavan,hairong,harsh,jitendra,jghoman,johan,knoguchi,kzhang,lohit,mahadev,matei,mattf,molkov,nigel,omalley,ramya,rangadi,sharad,shv,sradia,stevel,suresh,szetszwo,tanping,todd,tomwhite,tucu,umamahesh,yhemanth,zshao ---- Proposal: Apache Hadoop MapReduce as a TLP I propose we graduate MapReduce as a TLP named 'Apache Hadoop MapReduce'. I think the simplest way is to have all existing MR committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has: hadoop-mapreduce = acmurthy,amareshwari,amarrk,aw,bobby,cdouglas,cos,cutting,daryn,ddas,dhruba,enis,eric14,eyang,gkesavan,hairong,harsh,hitesh,jeagles,jitendra,jghoman,johan,kimballa,knoguchi,kzhang,llu,lohit,mahadev,matei,mattf,nigel,omalley,ramya,rangadi,ravigummadi,schen,sharad,shv,sradia,sreekanth,sseth,stevel,szetszwo,tgraves,todd,tomwhite,tucu,vinodkv,yhemanth,zshao ---- Proposal: Apache Hadoop YARN as a TLP I propose we graduate YARN as a TLP named 'Apache Hadoop YARN'. I re-propose, based on the previous discussion that the YARN committer list and initial PMC list be: hadoop-yarn = acmurthy,cdouglas,ddas,hitesh,jeagles,llu,mahadev,sharad,sseth,tgraves,tomwhite,tucu,vinodkv ---- Thoughts? thanks, Arun
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEli Collins 2012-08-29, 18:49
On Wed, Aug 29, 2012 at 11:22 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote: >> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote: >>> >>> Robert and Alejandro have brought up good questions. Here are my thoughts: >>> - For first one or two releases all the projects can coordinate and do the >>> releases together. This should help simplify the immediate work needed. >>> This should also help in us meeting the release timelines that we are >>> working towards. As the split makes progress, this cross project >>> coordination will no longer be necessary. I volunteer to RM these releases >>> and do the needed co-ordination from HDFS. >> >> >> +1 seems like a reasonable first step. Thanks for volunteering Suresh. > > Also, I'd say we make at least 3-4 alpha/beta releases in this shape. > > I volunteer to RM for MR/YARN releases and work with Suresh. > I volunteer to RM HDFS releases as well. I think we should coordinate releases, but I don't think we should gate HDFS releases on MR and YARN releases, that will be one of the benefits of becoming a TLP. Unlike parts of MR and YARN, HDFS wasn't completely re-written and so should be release on it's own cycle, eg I think we'll be able to release a non-alpha / beta 2.0 much sooner than MR or YARN. Thanks, Eli
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectTom White 2012-08-29, 20:34
Eric - I agree with Common being included in HDFS. That's what I meant
by Common not having a clear enough mission to be a TLP by itself. Arun - I'm happy to RM some of the upcoming MR releases too. Also to help out with the work on audience annotations and compatibility. Cheers, Tom On Wed, Aug 29, 2012 at 7:22 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote: >> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote: >>> >>> Robert and Alejandro have brought up good questions. Here are my thoughts: >>> - For first one or two releases all the projects can coordinate and do the >>> releases together. This should help simplify the immediate work needed. >>> This should also help in us meeting the release timelines that we are >>> working towards. As the split makes progress, this cross project >>> coordination will no longer be necessary. I volunteer to RM these releases >>> and do the needed co-ordination from HDFS. >> >> >> +1 seems like a reasonable first step. Thanks for volunteering Suresh. > > Also, I'd say we make at least 3-4 alpha/beta releases in this shape. > > I volunteer to RM for MR/YARN releases and work with Suresh. > > Arun >
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAlejandro Abdelnur 2012-08-29, 20:40
I volunteer to help cleanup/normalize Maven stuff.
Thx On Wed, Aug 29, 2012 at 1:34 PM, Tom White <[EMAIL PROTECTED]> wrote: > Eric - I agree with Common being included in HDFS. That's what I meant > by Common not having a clear enough mission to be a TLP by itself. > > Arun - I'm happy to RM some of the upcoming MR releases too. Also to > help out with the work on audience annotations and compatibility. > > Cheers, > Tom > > On Wed, Aug 29, 2012 at 7:22 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote: >>> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote: >>>> >>>> Robert and Alejandro have brought up good questions. Here are my thoughts: >>>> - For first one or two releases all the projects can coordinate and do the >>>> releases together. This should help simplify the immediate work needed. >>>> This should also help in us meeting the release timelines that we are >>>> working towards. As the split makes progress, this cross project >>>> coordination will no longer be necessary. I volunteer to RM these releases >>>> and do the needed co-ordination from HDFS. >>> >>> >>> +1 seems like a reasonable first step. Thanks for volunteering Suresh. >> >> Also, I'd say we make at least 3-4 alpha/beta releases in this shape. >> >> I volunteer to RM for MR/YARN releases and work with Suresh. >> >> Arun >> -- Alejandro
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectTodd Lipcon 2012-08-29, 21:18
Have we not learned our lessons from the last attempts to split?
The issues in our community, which I think Chris is referring to, do not generally revolve around project boundaries. It's not the case that the HDFS community wants to go one way and the MR/YARN community wants to go another, and we get into a conflict around it. If it were, then splitting into separate TLPs would make a ton of sense. Instead, the issues are usually _within_ a component. So, if we split into 3 TLPs, then we'll just have 3 TLPs, each of which is just as contentious as before. Let's just embrace contention as a fact of life on a high-profile high-stakes project and get back to work. I wasted nearly a month undoing the mess of the last attempt, and I don't see why this time it would go any better. -1 from my perspective on splitting again at this point. Perhaps if we get to the point that we're never making cross-project commits it makes sense, but we're not there still. -Todd On Wed, Aug 29, 2012 at 1:40 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: > I volunteer to help cleanup/normalize Maven stuff. > > Thx > > On Wed, Aug 29, 2012 at 1:34 PM, Tom White <[EMAIL PROTECTED]> wrote: >> Eric - I agree with Common being included in HDFS. That's what I meant >> by Common not having a clear enough mission to be a TLP by itself. >> >> Arun - I'm happy to RM some of the upcoming MR releases too. Also to >> help out with the work on audience annotations and compatibility. >> >> Cheers, >> Tom >> >> On Wed, Aug 29, 2012 at 7:22 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >>> On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote: >>>> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote: >>>>> >>>>> Robert and Alejandro have brought up good questions. Here are my thoughts: >>>>> - For first one or two releases all the projects can coordinate and do the >>>>> releases together. This should help simplify the immediate work needed. >>>>> This should also help in us meeting the release timelines that we are >>>>> working towards. As the split makes progress, this cross project >>>>> coordination will no longer be necessary. I volunteer to RM these releases >>>>> and do the needed co-ordination from HDFS. >>>> >>>> >>>> +1 seems like a reasonable first step. Thanks for volunteering Suresh. >>> >>> Also, I'd say we make at least 3-4 alpha/beta releases in this shape. >>> >>> I volunteer to RM for MR/YARN releases and work with Suresh. >>> >>> Arun >>> > > > > -- > Alejandro -- Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectJakob Homan 2012-08-29, 21:22
>
> > > Let's just embrace contention as a fact of life on a high-profile > high-stakes project and get back to work. +1
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectTravis Thompson 2012-08-29, 22:30
+1
not that anyone knows who I am :) On Aug 29, 2012, at 2:22 PM, Jakob Homan wrote: >> >> >> >> Let's just embrace contention as a fact of life on a high-profile >> high-stakes project and get back to work. > > > +1
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-29, 23:19
Hi Eli,
On Aug 29, 2012, at 11:41 AM, Eli Collins wrote: > Thanks for writing up a proposal Chris. NP. > > I think it makes sense to have Common live in HDFS at least for now, > since it's at the bottom of the stack / dependency chain and it's code > is the most intertwined with common, and, per Arun, we tend to work on > common stuff more than MR people. The HDFS project is really a lot > more than HDFS, eg has all the hadoop commands, non-HDFS file system > source, etc but that seems like an OK starting point. We need to > figure out the committers and PMC though since the goal is to just > have the HDFS community (vs the current Hadoop people) but the project > will contain non-HDFS stuff. I'd like to hear from the current Hadoop > committers and PMC members that mostly work on MR and YARN - are you > guys OK losing your current privileges on the HDFS repo? Rather than ask the former question that way, I would just simply put up a list of proposed HDFS PMC folks (yes, I keep using PMC ^_^). Then, iterate on that. > Otherwise we > haven't made much progress (ie HDFS still has multiple communities). ACK. > > We also need to address the areas where it's not so cut and dry, eg > where there is a single Hadoop project: > - The Hadoop trademark, assume this lives in the HDFS project if Common does? Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects don't own trademarks. > - The user community, eg the users lists that we *just* merged, shall > we still keep one list? That's a good question -- maybe ask users to opt-in. Yes, this is intrusive, but I bet you'd find the real users of the specific projects if they have to resubscribe. Just my 2c. > - We should move the global stuff like "how to get started" docs to > Bigtop, which can point to individual projects resources Sounds cool to me. > - Hadoop 1.x is is maintenance mode, though it still actively gets > patches so we need to consider it. The surgery necessary to split v1 > Hadoop is probably not suitable for a sustaining release and not worth > it at this point in the lifetime of this branch. I assume the HDFS > project will then host the Hadoop 1.x branches? This implies only > members of the HDFS project can commit and release. Why not put the 1.x stuff in Bigtop since it's global or whatever? Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-29, 23:20
Arun, great work below. Concrete, and an actual proposal of PMC lists.
What do folks think? Cheers, Chris On Aug 29, 2012, at 11:48 AM, Arun C Murthy wrote: > > On Aug 28, 2012, at 7:33 PM, Mattmann, Chris A (388J) wrote: > >> Process: >> >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. >> > > How about something like this... please provide your feedback. > > This is a very early draft, I'll post this on our wiki after discussion. > > ---- > > Proposal: Apache Hadoop HDFS as a TLP > > I propose we graduate HDFS as a TLP named 'Apache Hadoop HDFS'. > > I think the simplest way is to have all existing HDFS committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has: > > hadoop-hdfs = acmurthy,atm,aw,boryas,cdouglas,cos,cutting,daryn,ddas,dhruba,eli,enis,eric14,eyang,gkesavan,hairong,harsh,jitendra,jghoman,johan,knoguchi,kzhang,lohit,mahadev,matei,mattf,molkov,nigel,omalley,ramya,rangadi,sharad,shv,sradia,stevel,suresh,szetszwo,tanping,todd,tomwhite,tucu,umamahesh,yhemanth,zshao > > > ---- > > > Proposal: Apache Hadoop MapReduce as a TLP > > I propose we graduate MapReduce as a TLP named 'Apache Hadoop MapReduce'. > > I think the simplest way is to have all existing MR committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has: > > hadoop-mapreduce = acmurthy,amareshwari,amarrk,aw,bobby,cdouglas,cos,cutting,daryn,ddas,dhruba,enis,eric14,eyang,gkesavan,hairong,harsh,hitesh,jeagles,jitendra,jghoman,johan,kimballa,knoguchi,kzhang,llu,lohit,mahadev,matei,mattf,nigel,omalley,ramya,rangadi,ravigummadi,schen,sharad,shv,sradia,sreekanth,sseth,stevel,szetszwo,tgraves,todd,tomwhite,tucu,vinodkv,yhemanth,zshao > > > ---- > > > Proposal: Apache Hadoop YARN as a TLP > > I propose we graduate YARN as a TLP named 'Apache Hadoop YARN'. > > I re-propose, based on the previous discussion that the YARN committer list and initial PMC list be: > > hadoop-yarn = acmurthy,cdouglas,ddas,hitesh,jeagles,llu,mahadev,sharad,sseth,tgraves,tomwhite,tucu,vinodkv > > ---- > > Thoughts? > > thanks, > Arun > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectKonstantin Boudnik 2012-08-29, 23:27
On Wed, Aug 29, 2012 at 11:19PM, Mattmann, Chris A (388J) wrote:
> Hi Eli, > > On Aug 29, 2012, at 11:41 AM, Eli Collins wrote: > > > Thanks for writing up a proposal Chris. > > NP. > > > > > I think it makes sense to have Common live in HDFS at least for now, > > since it's at the bottom of the stack / dependency chain and it's code > > is the most intertwined with common, and, per Arun, we tend to work on > > common stuff more than MR people. The HDFS project is really a lot > > more than HDFS, eg has all the hadoop commands, non-HDFS file system > > source, etc but that seems like an OK starting point. We need to > > figure out the committers and PMC though since the goal is to just > > have the HDFS community (vs the current Hadoop people) but the project > > will contain non-HDFS stuff. I'd like to hear from the current Hadoop > > committers and PMC members that mostly work on MR and YARN - are you > > guys OK losing your current privileges on the HDFS repo? > > Rather than ask the former question that way, I would just simply put up > a list of proposed HDFS PMC folks (yes, I keep using PMC ^_^). Then, > iterate on that. > > > Otherwise we > > haven't made much progress (ie HDFS still has multiple communities). > > ACK. > > > > > We also need to address the areas where it's not so cut and dry, eg > > where there is a single Hadoop project: > > - The Hadoop trademark, assume this lives in the HDFS project if Common does? > > Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects > don't own trademarks. > > > - The user community, eg the users lists that we *just* merged, shall > > we still keep one list? > > That's a good question -- maybe ask users to opt-in. Yes, this is intrusive, but > I bet you'd find the real users of the specific projects if they have to resubscribe. > Just my 2c. > > > - We should move the global stuff like "how to get started" docs to > > Bigtop, which can point to individual projects resources > > Sounds cool to me. > > > - Hadoop 1.x is is maintenance mode, though it still actively gets > > patches so we need to consider it. The surgery necessary to split v1 > > Hadoop is probably not suitable for a sustaining release and not worth > > it at this point in the lifetime of this branch. I assume the HDFS > > project will then host the Hadoop 1.x branches? This implies only > > members of the HDFS project can commit and release. > > Why not put the 1.x stuff in Bigtop since it's global or whatever? Wearing my BigTop hat now, I encourage this audience to rush something like this to BigTop. If I am reading you correctly, you are asking BigTop to host 1.x branches of Hadoop, aren't you? I don't see how it fits in there, actually. But this is a separate issue that needs to involve BigTop community. Cos > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-29, 23:29
Hi Todd,
On Aug 29, 2012, at 2:18 PM, Todd Lipcon wrote: > Have we not learned our lessons from the last attempts to split? > > The issues in our community, which I think Chris is referring to, do > not generally revolve around project boundaries. It's not the case > that the HDFS community wants to go one way and the MR/YARN community > wants to go another, and we get into a conflict around it. If it were, > then splitting into separate TLPs would make a ton of sense. You're right, it's not project boundaries, it's poor community behavior, and general umbrella-project-ness. One aspect I've seen is that exclusivity of allowing people to become PMC members on the project, and the separation of PMC from C. Other things I've seen are the use of technical justifications or complexity issues as an excuse for the exclusivity, as an excuse for drawing boundaries between project committers and PMC members, and then between specific products that the project and community as a whole releases, and finally other things I've seen include external interests influencing the way that business is done around here (need for releases in downstream companies, or projects driving upstream, Apache decisions, which are supposed to be independent of any lone company, or set of companies -- it's individuals here that do the work). The above is not a discrete thing that's happened once, or twice, or that happened three times, but was fixed later. It's never been fixed. > > Instead, the issues are usually _within_ a component. So, if we split > into 3 TLPs, then we'll just have 3 TLPs, each of which is just as > contentious as before. I doubt that. Creating TLPs either directly by going to the board, or via going to the Incubator should involve a set of members of the committee (PMC) that desire to work together; that ideally trust one another; that are inclusive to others who jump on the list and discuss things; and that collect these principles into the "Apache way", and build and deliver software at no cost to the public via this Foundation. Currently, the Apache Hadoop project isn't doing that. Something needs to be done to fix it. Just because an attempt to split the projects in the past didn't work doesn't mean that the Hadoop community should just accept "this is a popular project; it's going to be contentious; nothing to see here folks". It's more than that. > > Let's just embrace contention as a fact of life on a high-profile > high-stakes project and get back to work. -1 to that. Apache projects shouldn't be contentious, whether you are a billion dollar industry like Hadoop, or whether you are the US govt, or whether you are Joe Blow, Mom and Pop, building software to deliver to food truck vendors. It doesn't matter. Period. > > I wasted nearly a month undoing the mess of the last attempt, and I > don't see why this time it would go any better. -1 from my perspective > on splitting again at this point. Perhaps if we get to the point that > we're never making cross-project commits it makes sense, but we're not > there still. Again, technical issues cited for community problems. *there are not technical issues*. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-29, 23:32
Hi Cos,
On Aug 29, 2012, at 4:27 PM, Konstantin Boudnik wrote: >> Sounds cool to me. >> >>> - Hadoop 1.x is is maintenance mode, though it still actively gets >>> patches so we need to consider it. The surgery necessary to split v1 >>> Hadoop is probably not suitable for a sustaining release and not worth >>> it at this point in the lifetime of this branch. I assume the HDFS >>> project will then host the Hadoop 1.x branches? This implies only >>> members of the HDFS project can commit and release. >> >> Why not put the 1.x stuff in Bigtop since it's global or whatever? > > Wearing my BigTop hat now, I encourage this audience to rush something like > this to BigTop. If I am reading you correctly, you are asking BigTop to host > 1.x branches of Hadoop, aren't you? I don't see how it fits in there, > actually. But this is a separate issue that needs to involve BigTop community. Agreed that it would totally involve the BigTop community, and that that part is up to them. You guys would know this way better than me, so thanks for mentioning this issue Cos. I just kinda threw this out there but it's not a blocker for me -- whatever makes sense here and it's a good point raised by Eli that can probably be solved a number of different (easily solvable and documentable) ways :) Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-29, 23:34
Really quick fix, sorry for the SPAM, going to take a break to replying
after this: On Aug 29, 2012, at 4:29 PM, Mattmann, Chris A (388J) wrote: >> [..snip..] >> there still. > > Again, technical issues cited for community problems. *there are not technical issues*. s/there/these/ Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectTodd Lipcon 2012-08-29, 23:35
On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > Arun, great work below. Concrete, and an actual proposal of PMC lists. > > What do folks think? Already expressed my opinion above on the thread that whole idea of splitting is crazy. But, I'll comment on some specifics of the proposal as well: >> >> I think the simplest way is to have all existing HDFS committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has: Why? If we were to do this, why not take the opportunity to narrow down into the people who are actually active contributors to the project? (per your reasoning on the YARN thread) >> >> hadoop-hdfs = acmurthy,atm,aw,boryas,cdouglas,cos,cutting,daryn,ddas,dhruba,eli,enis,eric14,eyang,gkesavan,hairong,harsh,jitendra,jghoman,johan,knoguchi,kzhang,lohit,mahadev,matei,mattf,molkov,nigel,omalley,ramya,rangadi,sharad,shv,sradia,stevel,suresh,szetszwo,tanping,todd,tomwhite,tucu,umamahesh,yhemanth,zshao Of these, only the following people have actually contributed more than 5 patches to common and HDFS in the last year: Hairong Kuang (7): Vinod Kumar Vavilapalli (7): Daryn Sharp (8): Matthew J. Foley (10): Devaraj Das (11): Mahadev Konar (15): Eric Yang (18): Sanjay Radia (18): Thomas Graves (18): Thomas White (21): Konstantin Shvachko (23): Steve Loughran (24): Arun Murthy (32): Uma Maheswara Rao G (36): Jitendra Nath Pandey (51): Harsh J (68): Robert Joseph Evans (71): Alejandro Abdelnur (106): Suresh Srinivas (107): Aaron Twining Myers (171): Tsz-wo Sze (184): Eli Collins (252): Todd Lipcon (286): So I would propose: atm,daryn,ddas,eli,eyang,hairong,harsh,jitendra,mahadev,mattf,shv,sradia,stevel,suresh,szetszwo,todd,tomwhite,tucu,umamahesh and listing the others as Emeritus, who could easily regain committer status if they started contributing again. >> >> >> ---- >> >> >> Proposal: Apache Hadoop MapReduce as a TLP >> >> I propose we graduate MapReduce as a TLP named 'Apache Hadoop MapReduce'. >> >> I think the simplest way is to have all existing MR committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has: >> >> hadoop-mapreduce = acmurthy,amareshwari,amarrk,aw,bobby,cdouglas,cos,cutting,daryn,ddas,dhruba,enis,eric14,eyang,gkesavan,hairong,harsh,hitesh,jeagles,jitendra,jghoman,johan,kimballa,knoguchi,kzhang,llu,lohit,mahadev,matei,mattf,nigel,omalley,ramya,rangadi,ravigummadi,schen,sharad,shv,sradia,sreekanth,sseth,stevel,szetszwo,tgraves,todd,tomwhite,tucu,vinodkv,yhemanth,zshao >> Applying the same criteria, the list would be: Suresh Srinivas (6): Aaron Twining Myers (7): Steve Loughran (7): Ravi Gummadi (9): Konstantin Shvachko (11): Todd Lipcon (12): Tsz-wo Sze (16): Amar Kamat (17): Harsh J (20): Eli Collins (21): Thomas White (27): Siddharth Seth (46): Thomas Graves (60): Alejandro Abdelnur (71): Robert Joseph Evans (107): Mahadev Konar (118): Vinod Kumar Vavilapalli (164): Arun Murthy (209): (this is based on git shortlog on the directories in the repository) But I still think this discussion is silly, and we're not ready to do it. -- Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectKonstantin Boudnik 2012-08-29, 23:40
On Wed, Aug 29, 2012 at 11:32PM, Mattmann, Chris A (388J) wrote:
> Hi Cos, > > On Aug 29, 2012, at 4:27 PM, Konstantin Boudnik wrote: > > >> Sounds cool to me. > >> > >>> - Hadoop 1.x is is maintenance mode, though it still actively gets > >>> patches so we need to consider it. The surgery necessary to split v1 > >>> Hadoop is probably not suitable for a sustaining release and not worth > >>> it at this point in the lifetime of this branch. I assume the HDFS > >>> project will then host the Hadoop 1.x branches? This implies only > >>> members of the HDFS project can commit and release. > >> > >> Why not put the 1.x stuff in Bigtop since it's global or whatever? > > > > Wearing my BigTop hat now, I encourage this audience to rush something like > > this to BigTop. If I am reading you correctly, you are asking BigTop to host > > 1.x branches of Hadoop, aren't you? I don't see how it fits in there, > > actually. But this is a separate issue that needs to involve BigTop community. > > Agreed that it would totally involve the BigTop community, and that that part > is up to them. You guys would know this way better than me, so thanks for > mentioning this issue Cos. I just kinda threw this out there but it's not a blocker I think this might be a good idea really, but we need to think over, I will start a thread on bigtop-dev@ to discuss what it means for us and how it can be done. > for me -- whatever makes sense here and it's a good point raised by Eli that > can probably be solved a number of different (easily solvable and documentable) ways :) Exactly. As a general observation: there's always more than one solution for people who are willing to do stuff instead of pontificating. Thanks for steering this up, actually! Cos > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectTodd Lipcon 2012-08-29, 23:44
On Wed, Aug 29, 2012 at 4:29 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > You're right, it's not project boundaries, it's poor community behavior, > and general umbrella-project-ness. No doubt there's bad behavior. But splitting into smaller projects won't help anything. We'll still have the exact same behavior inside the smaller projects. > > One aspect I've seen is that exclusivity of allowing people to become > PMC members on the project, and the separation of PMC from C. > Other things I've seen are the use of technical justifications or complexity > issues as an excuse for the exclusivity, as an excuse for drawing boundaries > between project committers and PMC members, and then between specific > products that the project and community as a whole releases, and finally > other things I've seen include external interests influencing the way that > business is done around here (need for releases in downstream companies, > or projects driving upstream, Apache decisions, which are supposed to be > independent of any lone company, or set of companies -- it's individuals here > that do the work). > It's individuals that do the work, but the individuals get paid by companies, so individuals acting in their best interests are going to tend to align with their company. They also often know details about their customer bases that they can't share directly, which can be frustrating, but it's a fact of life. I'm sure we'd see the same if we were 20 independent consultants each with our own priorities, etc. > The above is not a discrete thing that's happened once, or twice, or that > happened three times, but was fixed later. It's never been fixed. > IMO it's massively improved since a couple years ago. We're making good progress on the 2.0 line, we no longer have divergent forks, and I haven't seen an issue get vetoed in recent memory. Please provide some recent examples where you think that splitting into smaller granularity projects would help anything. >> >> Instead, the issues are usually _within_ a component. So, if we split >> into 3 TLPs, then we'll just have 3 TLPs, each of which is just as >> contentious as before. > > I doubt that. Creating TLPs either directly by going to the board, or > via going to the Incubator should involve a set of members of the > committee (PMC) that desire to work together; that ideally trust one another; that > are inclusive to others who jump on the list and discuss things; and that > collect these principles into the "Apache way", and build and deliver software at > no cost to the public via this Foundation. Just because we argue doesn't mean we don't desire to work together. Smart passionate people will argue. I argue with my colleagues here at Cloudera, I argue with Hortonworkers, and I argue with Facebookers - it doesn't really matter much. I still enjoy getting beers with them when I end up at conferences. No hard feelings, we're all adults, right? > > Currently, the Apache Hadoop project isn't doing that. Something needs > to be done to fix it. Just because an attempt to split the projects in the past > didn't work doesn't mean that the Hadoop community should just accept > "this is a popular project; it's going to be contentious; nothing to see here > folks". Again, please provide examples. From my vantage point, I see a lot of progress being made on critical features: we've done federation, HA namenode, massive performance improvements, YARN, practically rewritten NameNode, and more in the last couple years. Hardly an unproductive community. > > It's more than that. > >> >> Let's just embrace contention as a fact of life on a high-profile >> high-stakes project and get back to work. > > -1 to that. Apache projects shouldn't be contentious, whether you are a billion dollar > industry like Hadoop, or whether you are the US govt, or whether you are Joe Blow, > Mom and Pop, building software to deliver to food truck vendors. It doesn't matter. > Period. I guess we'll have to agree to disagree. ...says the guy who isn't on the hook to stitch it all back together into a deliverable for demanding customers, maintain green Jenkins builds, etc. You can say these aren't technical issues, but if you're not dealing with the project on a technical basis, I don't think you're well qualified to judge. I certainly appreciate the work you've done way back in the Nutch days and your continued evangelism, but this whole thread just seems like it's stirring up trouble and not going to accomplish anything except a bunch of wasted man-hours. (I've already wasted about 45 minutes today on it, oops!) -Todd Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectKonstantin Boudnik 2012-08-29, 23:47
I am curious where the arbitrar numbery 5 is coming from: is it reflected in
the bylaws? Cos On Wed, Aug 29, 2012 at 04:35PM, Todd Lipcon wrote: > On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: > > Arun, great work below. Concrete, and an actual proposal of PMC lists. > > > > What do folks think? > > Already expressed my opinion above on the thread that whole idea of > splitting is crazy. But, I'll comment on some specifics of the > proposal as well: > > >> > >> I think the simplest way is to have all existing HDFS committers be > >> committers and PMC members of the new project. That list is found in the > >> asf-authorization-template which has: > > Why? If we were to do this, why not take the opportunity to narrow > down into the people who are actually active contributors to the > project? (per your reasoning on the YARN thread) > > >> > >> hadoop-hdfs = acmurthy,atm,aw,boryas,cdouglas,cos,cutting,daryn,ddas,dhruba,eli,enis,eric14,eyang,gkesavan,hairong,harsh,jitendra,jghoman,johan,knoguchi,kzhang,lohit,mahadev,matei,mattf,molkov,nigel,omalley,ramya,rangadi,sharad,shv,sradia,stevel,suresh,szetszwo,tanping,todd,tomwhite,tucu,umamahesh,yhemanth,zshao > > Of these, only the following people have actually contributed more > than 5 patches to common and HDFS in the last year: > Hairong Kuang (7): > Vinod Kumar Vavilapalli (7): > Daryn Sharp (8): > Matthew J. Foley (10): > Devaraj Das (11): > Mahadev Konar (15): > Eric Yang (18): > Sanjay Radia (18): > Thomas Graves (18): > Thomas White (21): > Konstantin Shvachko (23): > Steve Loughran (24): > Arun Murthy (32): > Uma Maheswara Rao G (36): > Jitendra Nath Pandey (51): > Harsh J (68): > Robert Joseph Evans (71): > Alejandro Abdelnur (106): > Suresh Srinivas (107): > Aaron Twining Myers (171): > Tsz-wo Sze (184): > Eli Collins (252): > Todd Lipcon (286): > > So I would propose: > atm,daryn,ddas,eli,eyang,hairong,harsh,jitendra,mahadev,mattf,shv,sradia,stevel,suresh,szetszwo,todd,tomwhite,tucu,umamahesh > > and listing the others as Emeritus, who could easily regain committer > status if they started contributing again. > > >> > >> > >> ---- > >> > >> > >> Proposal: Apache Hadoop MapReduce as a TLP > >> > >> I propose we graduate MapReduce as a TLP named 'Apache Hadoop MapReduce'. > >> > >> I think the simplest way is to have all existing MR committers be committers and PMC members of the new project. That list is found in the asf-authorization-template which has: > >> > >> hadoop-mapreduce = acmurthy,amareshwari,amarrk,aw,bobby,cdouglas,cos,cutting,daryn,ddas,dhruba,enis,eric14,eyang,gkesavan,hairong,harsh,hitesh,jeagles,jitendra,jghoman,johan,kimballa,knoguchi,kzhang,llu,lohit,mahadev,matei,mattf,nigel,omalley,ramya,rangadi,ravigummadi,schen,sharad,shv,sradia,sreekanth,sseth,stevel,szetszwo,tgraves,todd,tomwhite,tucu,vinodkv,yhemanth,zshao > >> > > Applying the same criteria, the list would be: > > Suresh Srinivas (6): > Aaron Twining Myers (7): > Steve Loughran (7): > Ravi Gummadi (9): > Konstantin Shvachko (11): > Todd Lipcon (12): > Tsz-wo Sze (16): > Amar Kamat (17): > Harsh J (20): > Eli Collins (21): > Thomas White (27): > Siddharth Seth (46): > Thomas Graves (60): > Alejandro Abdelnur (71): > Robert Joseph Evans (107): > Mahadev Konar (118): > Vinod Kumar Vavilapalli (164): > Arun Murthy (209): > > (this is based on git shortlog on the directories in the repository) > > > But I still think this discussion is silly, and we're not ready to do it. > > -- > Todd Lipcon > Software Engineer, Cloudera
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectTodd Lipcon 2012-08-29, 23:48
On Wed, Aug 29, 2012 at 4:47 PM, Konstantin Boudnik <[EMAIL PROTECTED]> wrote:
> I am curious where the arbitrar numbery 5 is coming from: is it reflected in > the bylaws? Nope, I picked it based on Arun's earlier picking of the same number in the YARN thread. We have no bylaws about what would happen in the eventual TLP-ification of subcomponents, of course. -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAaron T. Myers 2012-08-29, 23:53
On Wed, Aug 29, 2012 at 4:35 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> But I still think this discussion is silly, and we're not ready to do it. > +1 Despite many allusions to problems that this project split proposal would purport to solve, I honestly don't see the problems. Yes, Hadoop has had community problems in the past, but from my observation these have largely been addressed or are improving. We've been adding committers and PMC members, making more frequent releases, making sure that features show up on trunk first before other branches, generally been collaborating better, etc. Do we disagree from time to time? Sure. Are these disagreements across the sub-project boundaries? Not in my experience. Given that, what _actual problems_ will a project split solve? I _do_ see plenty of problems that a project split would create, such as difficulties with changes that span the projects, difficulties maintaining the interfaces of code that's shared by the projects, difficulties of a split user@ mailing list, etc. All of _these_ problems are well known to us from the previous "project split" which just split the mailing lists, code repos, and issue trackers. In the last few months, we've thought better of 2/3 of those decisions and actually merged back the repos and mailing lists. It's quite surprising to me to see many folks on this thread who supported these merges actually being in favor of splitting them again. Chris, you can dismissively say that these are "technical difficulties" but all of these problems directly impact the community as well. When the project repos were split, I personally helped many struggling users just getting their work environment set up to _compile_ the code. This was a pain for everyone, so we undid it. When the lists were split, users struggled to know where they should email their questions, and there was a lot of wasted effort telling folks to go ask this list or that. This was a pain for everyone, so we undid it. I think both of these changes have been tremendous _positive_ impacts on the community, and the haste with which we're rushing to undo them is very surprising to me. -- Aaron T. Myers Software Engineer, Cloudera
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-29, 23:54
OK I lied and said I wouldn't reply :)
On Aug 29, 2012, at 4:44 PM, Todd Lipcon wrote: > On Wed, Aug 29, 2012 at 4:29 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: > >> You're right, it's not project boundaries, it's poor community behavior, >> and general umbrella-project-ness. > > No doubt there's bad behavior. But splitting into smaller projects > won't help anything. We'll still have the exact same behavior inside > the smaller projects. > >> [..snip...] > >> The above is not a discrete thing that's happened once, or twice, or that >> happened three times, but was fixed later. It's never been fixed. >> > > IMO it's massively improved since a couple years ago. We're making > good progress on the 2.0 line, we no longer have divergent forks, and > I haven't seen an issue get vetoed in recent memory. Please provide > some recent examples where you think that splitting into smaller > granularity projects would help anything. Please provide examples that show umbrella projects work. I've been at this Foundation a lot longer than you have. I've seen them not work and have been involved in ones that don't work. See splits from Lucene, the same threads (with different names, different products, different software but the exact same issues). See your own splits from Hadoop cited elsethread. See the friggin' Apache board minutes discussing why umbrella projects are bad. I don't know what else to tell you. I'm not going to go look up all the threads. I'm not Google nor do I care to. All I can say is that I've seen it before and so have others. In your own project. > >>> >>> Instead, the issues are usually _within_ a component. So, if we split >>> into 3 TLPs, then we'll just have 3 TLPs, each of which is just as >>> contentious as before. >> >> I doubt that. Creating TLPs either directly by going to the board, or >> via going to the Incubator should involve a set of members of the >> committee (PMC) that desire to work together; that ideally trust one another; that >> are inclusive to others who jump on the list and discuss things; and that >> collect these principles into the "Apache way", and build and deliver software at >> no cost to the public via this Foundation. > > Just because we argue doesn't mean we don't desire to work together. > Smart passionate people will argue. I argue with my colleagues here at > Cloudera, I argue with Hortonworkers, and I argue with Facebookers - > it doesn't really matter much. I still enjoy getting beers with them > when I end up at conferences. No hard feelings, we're all adults, > right? You still point to arguing to contention -- it's more than that Todd. The project's policies for inclusivity have nothing to do with arguing about technical issues. > >> >> Currently, the Apache Hadoop project isn't doing that. Something needs >> to be done to fix it. Just because an attempt to split the projects in the past >> didn't work doesn't mean that the Hadoop community should just accept >> "this is a popular project; it's going to be contentious; nothing to see here >> folks". > > Again, please provide examples. From my vantage point, I see a lot of > progress being made on critical features: we've done federation, HA > namenode, massive performance improvements, YARN, practically > rewritten NameNode, and more in the last couple years. Hardly an > unproductive community. Technical issues, again. > [..snip..] > >>> >>> I wasted nearly a month undoing the mess of the last attempt, and I >>> don't see why this time it would go any better. -1 from my perspective >>> on splitting again at this point. Perhaps if we get to the point that >>> we're never making cross-project commits it makes sense, but we're not >>> there still. >> >> Again, technical issues cited for community problems. *there are not technical issues*. > > ...says the guy who isn't on the hook to stitch it all back together > into a deliverable for demanding customers, maintain green Jenkins Dude, you have to do that regardless, that has nothing to do with *Apache Hadoop*. Take your Cloudera hat off and put your *Apache Software Foundation* hat on. Is your #1 priority developing software here to stitch code back together, turn it into a deliverable for your customers (I'm guessing Cloudera customers, right? B/c Apache doesn't have specific customers?) and to maintain green Jenkins builds? Also tell me how the 4 SVN commands I suggested will stop you from doing the above? At Apache? At Cloudera, tell me also how it will stop you? I think you can quote me several times in this same thread and else-thread saying I'm not technically astute with Hadoop anymore :) Admitted. However, I *am* astute with the aspects of this Software Foundation. You had fun during those 45 mins don't lie :) P.S. I appreciate you and am still one of your biggest fans. Just trying to help you see the bigger picture here and to wear your Apache hat. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectTodd Lipcon 2012-08-30, 00:16
On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > > Please provide examples that show umbrella projects work. Hadoop, in its current form? The code bases are tightly intertwined. We pulled out Pig/Hive/HBase because they were substantial codebases that didn't share much code with the rest, and thus could reasonably be expected to release independently. We could get HDFS and MR to that point, but we haven't yet, because they rely so much on Common. If we copy-paste forked Common, we'd be doubling our maintenance work on this shared code. We basically did this with the IPC code for HBase, and then we had double the work to protobuf-ify both HBase and HDFS/MR earlier this year. I know because I spent a bunch of hours on both. > I've been > at this Foundation a lot longer than you have. I've seen them not work > and have been involved in ones that don't work. See splits from Lucene, > the same threads (with different names, different products, different software > but the exact same issues). See your own splits from Hadoop cited elsethread. > See the friggin' Apache board minutes discussing why umbrella projects > are bad. > > I don't know what else to tell you. I'm not going to go look up all the threads. > I'm not Google nor do I care to. All I can say is that I've seen it before and > so have others. In your own project. > What's one concrete example of where it would be better if we split? I can't think of any. We'd still have competing interests in HDFS, and we'd still get in the same arguments. To say that all ASF projects should work the same seems pretty bizarre to me. The ASF provides license protection, infrastructure, and a set of guidelines for what makes successful projects. But I don't think it is the foundation's place to dictate what its projects should do "from above" if the projects themselves do not see a problem. If the project is so messed up, then maybe some folks should fork it into the incubator like you've suggested? What's wrong with the anarchic "let the best project succeed" philosophy, which I've also heard from Apache? > You still point to arguing to contention -- it's more than that Todd. The project's > policies for inclusivity have nothing to do with arguing about technical issues. I'm absolutely for meritocracy. I just have a high bar for what should be considered "merit". Perhaps the PMC as a whole has a high bar. For a system that stores my data, I'm pretty happy about that. > > Dude, you have to do that regardless, that has nothing to do with *Apache Hadoop*. > Take your Cloudera hat off and put your *Apache Software Foundation* hat on. Is your > #1 priority developing software here to stitch code back together, turn it into a deliverable > for your customers (I'm guessing Cloudera customers, right? B/c Apache doesn't have > specific customers?) and to maintain green Jenkins builds? Yes? I think so? If we do a bad release and it loses substantial data, our user base would disappear quite quickly. > > Also tell me how the 4 SVN commands I suggested will stop you from doing the above? > At Apache? If the projects are on separate release schedules, this means that cross-project changes have to be staged across the projects in such a way that neither project breaks in the interim. All of our internal APIs become public APIs. We worked like this for around a year during the "project split" period. It was super complicated and our builds were often red, we wasted a lot of time, and new users couldn't figure out how to contribute. In the absense of a reasonable *technical* strategy to release independently, and a lot of work to stabilize internal APIs around security and IPC in particular, doing it again would cause the same problems it caused the first time. It also makes the users' lives much more difficult, or forces them to only consume via downstream packagers. Earlier in this thread, you seemed to think that downstream packagers indicated an issue with the community: fracturing the releases would only serve to make the ASF download page even less useful for someone who just wants to get going fast. If the projects were on different release schedules, then we'd be more likely to have to do a lot of local patching to get stuff to "fit together" right. Version compatibility is a difficult problem - it multiplies the QA matrix, complicates deployment, etc. It's not insurmountable, but unless there's something to be gained (what is it, again, that you think we'd gain, specifically?) I don't see why we'd take this additional hassle. Thanks for that. As for Apache vs Cloudera hat: I think they're well aligned here. Both hats want the project to be easy for people to contribute to, and want to avoid a bunch of wasted time spent on new technical issues that this would create. I want to spend that time making the product better, for our users benefit. Whether the users are Apache community users, or Cloudera customers, or Facebook's data scientists, they all are going to be happier if I spend a month improving our HA support compared to spending a month figuring out how to release three separate projects which somehow stitch together in a reasonable way at runtime without jar conflicts, tons of duplicate configuration work, byzantine version dependencies, etc. -Todd Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-30, 00:55
Hey Todd,
On Aug 29, 2012, at 5:16 PM, Todd Lipcon wrote: > On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: >> >> Please provide examples that show umbrella projects work. > > Hadoop, in its current form? I don't agree that it's working. That's where you and I differ. And not just you and I, you and the others that have agreed with me else-thread. Technically, the project is working for sure. Community-wise, no. I guess we can agree to disagree. > > > If we copy-paste forked Common, we'd be doubling our maintenance work > on this shared code. Who's "we"? You? Would you expect to be a PMC member/committer in all split projects? Also, are you the only person working on the project? And the "we" would include others, right? Who may or may not be committers on the other projects? I'm not proposing SVN copy and then all PMC members x N projects. Figure out who are on the PMCs for the distinct communities that are operating on this hydra. >> >> I don't know what else to tell you. I'm not going to go look up all the threads. >> I'm not Google nor do I care to. All I can say is that I've seen it before and >> so have others. In your own project. >> > > What's one concrete example of where it would be better if we split? Training off bad community practices is difficult, I'll agree with you on that. Hopefully if these new projects went the Incubator route, you could get some other fuddy duddy's like me that have been around and seen a lot at the Foundation helping the new projects really understand the community aspects. > > To say that all ASF projects should work the same seems pretty bizarre > to me. Please show me where I said the above sentence? > The ASF provides license protection, infrastructure, and a set > of guidelines for what makes successful projects. Guidelines which the Apache Hadoop PMC continues not to follow. Technically successful yes. Community-wise successful, sorta. > But I don't think it > is the foundation's place to dictate what its projects should do "from > above" if the projects themselves do not see a problem. No, but it's the Foundation's (and its members) responsibility to ensure that its projects are behaving in that loosely coupled set of principles and guidelines that we call the Apache way. Apache Hadoop is doing great technically. Not so sure about the Apache way part. > > If the project is so messed up, then maybe some folks should fork it > into the incubator like you've suggested? What's wrong with the > anarchic "let the best project succeed" philosophy, which I've also > heard from Apache? Yeah I proposed that too. We'll see if it happens. Concretely, I think all of the current Hadoop "sub projects" should take a spin through the Incubator and see how they are doing as projects. If nothing is afoul, I'm sure it would be a pretty quick process, right? Add new some PPMC members/committers, make a release or two, make sure all software is ALv2 and compat. You guys are already doing that, right? > >> You still point to arguing to contention -- it's more than that Todd. The project's >> policies for inclusivity have nothing to do with arguing about technical issues. > > I'm absolutely for meritocracy. I just have a high bar for what should > be considered "merit". Perhaps the PMC as a whole has a high bar. For > a system that stores my data, I'm pretty happy about that. You won't be pretty happy about it when your high bar leaves you as one of the only people int he world maintaining a 100M line code base. Especially as you get older, have kids (or not), have a family, go on to do even bigger and better things, and care even less about reading emails like this. You're going to see eventually (as will others) that the way that you grow around this Foundation (and in software in general) is to teach others how to do your job, and to attract people to your project, and not to shoo them away with exclusivity. You call it a "high bar" to "protect your data". I call it "enjoy maintaining the software forever and never taking a vacation". It's called scalability Todd. Of course, because 1 release kills a project right? And of course there weren't 30 some odd releases before that one bad one that someone could roll back to, right? Huh?? Because this is what happens with Tomcat, or whatever other dependencies you guys have in your modularized project right? You guys call up the Tomcat PMC whenever there is a release and make sure that your Hadoop specific need is included in it right? Or that they include some bug fix that you really need? C'mon, you know that's not the way stuff works. It's called insulation. I agree there should be a plan to technically work to make sure the independent TLPs (or podlings->TLPs eventually whatever) sync up or line up -- that would be ideal. What if it doesn't happen? Will the world end? Probably not. Because there are good people hanging around that will get stuff done and make sure new TLP software foo bar technically works great as they have always done. No it doesn't. That's orthogonal? Nah, I was talking about downstream "companies" and their interests, not packagers. Why is that? Isn't that what *Apache* Big Top (incubating) is for (which also has an *Apache* download page?). +1, this could be the case. Yep agree. As for the gain, I think what you'd gain is less arguments about who to add to the PMC, how to add them, less maintenance of lame ASF authorization templates within *the same project*, less meta-discussions, and company politic spillover, and hopefully more beer to be shared by all. Note, I said *I think*. I'm only truly physic sometimes. That's a fair statement Todd. But that's why it's not Apache Todd, or Apache Todooop. And why there are others at the Foundation, that you have to rely on, others within your project that you have to rely on, and why not everyone has the same interests. Some people'
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectArun C Murthy 2012-08-30, 01:47
On Aug 29, 2012, at 4:32 PM, Mattmann, Chris A (388J) wrote: > Hi Cos, > > On Aug 29, 2012, at 4:27 PM, Konstantin Boudnik wrote: > >>> Sounds cool to me. >>> >>>> - Hadoop 1.x is is maintenance mode, though it still actively gets >>>> patches so we need to consider it. The surgery necessary to split v1 >>>> Hadoop is probably not suitable for a sustaining release and not worth >>>> it at this point in the lifetime of this branch. I assume the HDFS >>>> project will then host the Hadoop 1.x branches? This implies only >>>> members of the HDFS project can commit and release. >>> >>> Why not put the 1.x stuff in Bigtop since it's global or whatever? >> >> Wearing my BigTop hat now, I encourage this audience to rush something like >> this to BigTop. If I am reading you correctly, you are asking BigTop to host >> 1.x branches of Hadoop, aren't you? I don't see how it fits in there, >> actually. But this is a separate issue that needs to involve BigTop community. > > Agreed that it would totally involve the BigTop community, and that that part > is up to them. You guys would know this way better than me, so thanks for > mentioning this issue Cos. I just kinda threw this out there but it's not a blocker > for me -- whatever makes sense here and it's a good point raised by Eli that > can probably be solved a number of different (easily solvable and documentable) ways :) I agree with Eli we can solve it number of ways within the new TLPs - I'm also pretty it doesn't make sense to involve BigTop. I'd rather not waste bandwidth on that alley. thanks, Arun
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectArun C Murthy 2012-08-30, 01:52
On Aug 29, 2012, at 4:48 PM, Todd Lipcon wrote: > On Wed, Aug 29, 2012 at 4:47 PM, Konstantin Boudnik <[EMAIL PROTECTED]> wrote: >> I am curious where the arbitrar numbery 5 is coming from: is it reflected in >> the bylaws? > > Nope, I picked it based on Arun's earlier picking of the same number > in the YARN thread. We have no bylaws about what would happen in the > eventual TLP-ification of subcomponents, of course. I'm sure you just missed it - but, I want to set the record straight: I picked 20+ patch contributions or 10+ review/commits since *project inception*. Your pick seems to be just commits in last 12 months. I have put forth one, please put forth another proposal if you like. However, please, do include patches, not just commits. For e.g. I'd propose we add llu@ for HDFS since he's done a ton of work on metrics2 recently. My bad for missing that initially - apologies Luke. I might have missed more, pls ping me or add yourself. I've put my proposal up on http://wiki.apache.org/hadoop/HDFS_MR_YARN_TLP_Proposal. We could also revisit issues like emeritus after the split to allow each project to figure it's own norms - I'd urge for that option. thanks, Arun
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectKonstantin Boudnik 2012-08-30, 02:59
On Wed, Aug 29, 2012 at 04:44PM, Todd Lipcon wrote:
> On Wed, Aug 29, 2012 at 4:29 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: > ... > > I doubt that. Creating TLPs either directly by going to the board, or > > via going to the Incubator should involve a set of members of the > > committee (PMC) that desire to work together; that ideally trust one another; that > > are inclusive to others who jump on the list and discuss things; and that > > collect these principles into the "Apache way", and build and deliver software at > > no cost to the public via this Foundation. > > Just because we argue doesn't mean we don't desire to work together. > Smart passionate people will argue. I argue with my colleagues here at > Cloudera, I argue with Hortonworkers, and I argue with Facebookers - > it doesn't really matter much. I still enjoy getting beers with them > when I end up at conferences. No hard feelings, we're all adults, > right? (sorry for snipping...) That's a truly amazing, Todd, and you certainly are lucky to be working in such a great environment! (the following isn't a stab at you, personally, so please don't get it that way) I was "terminated" from my previous job because I was expressing my opinions on this list all too freely. And the said opinions happened to be misaligned with the "official party line" of my then-employer. Or was it because my opinions were hurting somebody else, that my employer didn't want to piss off at the time? Hmm... does my memory getting vague? Hardly so. And it's exactly when I have added the disclaimer below to my apache email account's signature. But no hard feeling, I guess, right? So let's put it straight - politics is hurting this community but in the pursuit of the 'best interest' haven't we became a bit too complacent? Is there a way around it? I am sure there is! Will we find that way? Only time will tell, I guess. Regards, Cos 2CAC 8312 4870 D885 8616 6115 220F 6980 1F27 E622 Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any company the author might be affiliated with at the moment of writing.
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEli Collins 2012-08-30, 05:38
On Wed, Aug 29, 2012 at 4:19 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > Hi Eli, > > On Aug 29, 2012, at 11:41 AM, Eli Collins wrote: > >> Thanks for writing up a proposal Chris. > > NP. > >> >> I think it makes sense to have Common live in HDFS at least for now, >> since it's at the bottom of the stack / dependency chain and it's code >> is the most intertwined with common, and, per Arun, we tend to work on >> common stuff more than MR people. The HDFS project is really a lot >> more than HDFS, eg has all the hadoop commands, non-HDFS file system >> source, etc but that seems like an OK starting point. We need to >> figure out the committers and PMC though since the goal is to just >> have the HDFS community (vs the current Hadoop people) but the project >> will contain non-HDFS stuff. I'd like to hear from the current Hadoop >> committers and PMC members that mostly work on MR and YARN - are you >> guys OK losing your current privileges on the HDFS repo? > > Rather than ask the former question that way, I would just simply put up > a list of proposed HDFS PMC folks (yes, I keep using PMC ^_^). Then, > iterate on that. > >> Otherwise we >> haven't made much progress (ie HDFS still has multiple communities). > > ACK. > >> >> We also need to address the areas where it's not so cut and dry, eg >> where there is a single Hadoop project: >> - The Hadoop trademark, assume this lives in the HDFS project if Common does? > > Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects > don't own trademarks. But which PMC does "the PMC" refer to though given that there is no longer a Hadoop PMC?
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEli Collins 2012-08-30, 05:46
On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > Arun, great work below. Concrete, and an actual proposal of PMC lists. > > What do folks think? I don't see how it helps. This substantially *increases* the size of the PMC for HDFS, I don't even recognize a bunch of names on this list. Unless we're actually going to try to make the HDFS project represent the people who actually contribute and run the project we're just replicating the current situation across 3 projects. 5+ hdfs patches in the last year seems like a pretty low bar to me.
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-30, 06:06
Hey Eli,
On Aug 29, 2012, at 10:38 PM, Eli Collins wrote: >>> [..snip..] >>> - The Hadoop trademark, assume this lives in the HDFS project if Common does? >> >> Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects >> don't own trademarks. > > But which PMC does "the PMC" refer to though given that there is no > longer a Hadoop PMC? Probably the collective set of PMCs that are created, along with trademarks@, and along with other members of the Foundation. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEli Collins 2012-08-30, 06:18
On Wed, Aug 29, 2012 at 11:06 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > Hey Eli, > > On Aug 29, 2012, at 10:38 PM, Eli Collins wrote: > >>>> [..snip..] >>>> - The Hadoop trademark, assume this lives in the HDFS project if Common does? >>> >>> Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects >>> don't own trademarks. >> >> But which PMC does "the PMC" refer to though given that there is no >> longer a Hadoop PMC? > > Probably the collective set of PMCs that are created, along with trademarks@, > and along with other members of the Foundation. > But what are we enforcing as the "Hadoop" trademark if there is no longer a Hadoop product release?
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectArun C Murthy 2012-08-30, 06:31
On Aug 29, 2012, at 10:46 PM, Eli Collins wrote: > On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: >> Arun, great work below. Concrete, and an actual proposal of PMC lists. >> >> What do folks think? > > I don't see how it helps. This substantially *increases* the size of > the PMC for HDFS, I don't even recognize a bunch of names on this > list. Unless we're actually going to try to make the HDFS project > represent the people who actually contribute and run the project we're > just replicating the current situation across 3 projects. 5+ hdfs > patches in the last year seems like a pretty low bar to me. Fine. Could you please provide us with an alternate for consideration? thanks, Arun
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-30, 06:31
Hey Eli,
On Aug 29, 2012, at 11:18 PM, Eli Collins wrote: > On Wed, Aug 29, 2012 at 11:06 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: >> Hey Eli, >> >> On Aug 29, 2012, at 10:38 PM, Eli Collins wrote: >> >>>>> [..snip..] >>>>> - The Hadoop trademark, assume this lives in the HDFS project if Common does? >>>> >>>> Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects >>>> don't own trademarks. >>> >>> But which PMC does "the PMC" refer to though given that there is no >>> longer a Hadoop PMC? >> >> Probably the collective set of PMCs that are created, along with trademarks@, >> and along with other members of the Foundation. >> > > But what are we enforcing as the "Hadoop" trademark if there is no > longer a Hadoop product release? Well Hadoop as a trademark, registered by the ASF, will remain. It doesn't go away, whether there is an explicit Hadoop TLP or product that TLP releases or not. I'd imagine as a PMC member once on the Hadoop TLP before it went away, you could choose to enforce the Hadoop trademarks by working with trademarks@ in the same way that you currently do, or don't, or whatever. And "enforce" is a loose word, since everyone's idea of "enforce" with respect to Apache PMCs and trademarks and so forth somewhat differs. My 2c. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectSharad Agarwal 2012-08-30, 06:41
On Thu, Aug 30, 2012 at 2:48 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> Have we not learned our lessons from the last attempts to split? > > Let's just embrace contention as a fact of life on a high-profile > high-stakes project and get back to work. > > +1. Having me worked and wasted cycles on project split earlier, I agree with Todd. IMO these are not matured enough to fly off independently and to make that happen needs good amount of upfront investment in terms of build/repo/jira/wiki/mailing lists and then recurring pain till the point the interfaces are stabilized, duplicate code, cross stack testing and what not. It is a big mess with very little gain. There are much more pressing problems to solve and TLP for each of these projects is not just worth it. We are making great progress in terms of Hadoop 1.0 and 2.0. I believe we should not derail these efforts unnecessarily.
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEli Collins 2012-08-30, 07:02
On Wed, Aug 29, 2012 at 11:31 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> > On Aug 29, 2012, at 10:46 PM, Eli Collins wrote: > >> On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J) >> <[EMAIL PROTECTED]> wrote: >>> Arun, great work below. Concrete, and an actual proposal of PMC lists. >>> >>> What do folks think? >> >> I don't see how it helps. This substantially *increases* the size of >> the PMC for HDFS, I don't even recognize a bunch of names on this >> list. Unless we're actually going to try to make the HDFS project >> represent the people who actually contribute and run the project we're >> just replicating the current situation across 3 projects. 5+ hdfs >> patches in the last year seems like a pretty low bar to me. > > > Fine. Could you please provide us with an alternate for consideration? > Todd's list seems more in line with the goal of reducing project members to reflect the actual community. I see Chris' point about the community issues, however I also see Todd's point that splitting the projects does not address these issues while bringing real overhead and rolling back things we've done recently to un-split the projects (per the vote thread I'm in favor of combining the committer lists even if we later split projects). In short, I'm open to a project split and willing to discuss, I don't yet see sufficient benefits to provide a concrete proposal myself. Thanks, Eli
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAlejandro Abdelnur 2012-08-30, 07:11
I'm for the split only after we sort out how to deal with the
technical issues mentioned in this thread. IMO, unless we have a clear plan/understanding for them, this split will go sour from a technical point. Chris, I know you disagree on this, but given the current state of the code/interface I think this is a blocker for the split. Thx On Thu, Aug 30, 2012 at 12:02 AM, Eli Collins <[EMAIL PROTECTED]> wrote: > On Wed, Aug 29, 2012 at 11:31 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> >> On Aug 29, 2012, at 10:46 PM, Eli Collins wrote: >> >>> On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J) >>> <[EMAIL PROTECTED]> wrote: >>>> Arun, great work below. Concrete, and an actual proposal of PMC lists. >>>> >>>> What do folks think? >>> >>> I don't see how it helps. This substantially *increases* the size of >>> the PMC for HDFS, I don't even recognize a bunch of names on this >>> list. Unless we're actually going to try to make the HDFS project >>> represent the people who actually contribute and run the project we're >>> just replicating the current situation across 3 projects. 5+ hdfs >>> patches in the last year seems like a pretty low bar to me. >> >> >> Fine. Could you please provide us with an alternate for consideration? >> > > Todd's list seems more in line with the goal of reducing project > members to reflect the actual community. > > I see Chris' point about the community issues, however I also see > Todd's point that splitting the projects does not address these issues > while bringing real overhead and rolling back things we've done > recently to un-split the projects (per the vote thread I'm in favor of > combining the committer lists even if we later split projects). In > short, I'm open to a project split and willing to discuss, I don't yet > see sufficient benefits to provide a concrete proposal myself. > > Thanks, > Eli -- Alejandro
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEli Collins 2012-08-30, 07:17
On Wed, Aug 29, 2012 at 11:31 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > Hey Eli, > > On Aug 29, 2012, at 11:18 PM, Eli Collins wrote: > >> On Wed, Aug 29, 2012 at 11:06 PM, Mattmann, Chris A (388J) >> <[EMAIL PROTECTED]> wrote: >>> Hey Eli, >>> >>> On Aug 29, 2012, at 10:38 PM, Eli Collins wrote: >>> >>>>>> [..snip..] >>>>>> - The Hadoop trademark, assume this lives in the HDFS project if Common does? >>>>> >>>>> Apache owns the Hadoop trademark, and the PMC helps to enforce it. Projects >>>>> don't own trademarks. >>>> >>>> But which PMC does "the PMC" refer to though given that there is no >>>> longer a Hadoop PMC? >>> >>> Probably the collective set of PMCs that are created, along with trademarks@, >>> and along with other members of the Foundation. >>> >> >> But what are we enforcing as the "Hadoop" trademark if there is no >> longer a Hadoop product release? > > Well Hadoop as a trademark, registered by the ASF, will remain. It doesn't go away, > whether there is an explicit Hadoop TLP or product that TLP releases or not. I'd imagine as a PMC member once > on the Hadoop TLP before it went away, you could choose to enforce the Hadoop trademarks by working with > trademarks@ in the same way that you currently do, or don't, or whatever. > > And "enforce" is a loose word, since everyone's idea of "enforce" with respect to Apache > PMCs and trademarks and so forth somewhat differs. > > My 2c. > I get that part, just not sure what we'd be enforcing. A concrete proposal will need to figure out what this means once there is no such thing as a Hadoop release. See http://wiki.apache.org/hadoop/Defining%20Hadoop for some relevant background on an old proposal that didn't go anywhere.
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectKonstantin Shvachko 2012-08-30, 10:12
On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > OK I lied and said I wouldn't reply :) Long thread. I just picked a random Chris's (as the initiator) email to reply. Chris, You are basically saying there's been a history of community problems in Hadoop project, and proposing a technical solution to split the project by replicating the source base under three new names, implying that this will solve the community problems we (the Hadoop community) are facing. I see several issues. 1. There are other ways to split the project. We essentially have a "natural" split of the project already in place. Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk are in a sense competing projects by themselves, with own contributors and release cycles. 2. From technical (not community) viewpoint your "svn copy" is an ugly approach, as it creates a lot of code duplication and will result in a maintenance nightmare or / and will require many man-months to fix. My point is that you cannot neglect "technical issues" when you solve community problems. 3. I am as skeptical as Todd that the community problems will be solved by simply TLP-ing the three projects. Two years ago Hadoop was in crises as vendors were producing their own releases calling it Hadoop. I think this was solved, but "poor community behavior" and contentions remained, embrace them or not. 4. Having said the above, separating the projects seems reasonable. (See timing though) HDFS will inevitable have to inherit and maintain most of Common. Totally understand frustration of people who just put a huge effort into merging the sources back under common root. 5. Timing is important. Waiting until Hadoop 2 is stable as Arun suggested earlier would probably be too long. Doing it next week, without discussing and solving technical issue listed in the thread would be premature. I think Hadoop 0.23.3 release backed by Yahoo production has a potential to become the next stable version, letting the project to move ahead off the four year old code base. We should help that happen first, and do necessary preparations for the split in the mean time. Thanks, --Konstantin
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectArun C Murthy 2012-08-30, 10:25
Konstantin,
On Aug 30, 2012, at 3:12 AM, Konstantin Shvachko wrote: > > 5. Timing is important. > Waiting until Hadoop 2 is stable as Arun suggested earlier would > probably be too long. > Doing it next week, without discussing and solving technical issue > listed in the thread would be premature. > I think Hadoop 0.23.3 release backed by Yahoo production has a > potential to become > the next stable version, letting the project to move ahead off the > four year old code base. > We should help that happen first, and do necessary preparations for > the split in the mean time. Agreed. This seems very reasonable - this is along the lines of what I was proposing when I said we should split *before* we declare hadoop-2 as GA (not after Konst, no worries). Arun
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectArun C Murthy 2012-08-30, 11:00
On Aug 30, 2012, at 3:12 AM, Konstantin Shvachko wrote: > 2. From technical (not community) viewpoint your "svn copy" is an ugly > approach, > as it creates a lot of code duplication and will result in a > maintenance nightmare or / and > will require many man-months to fix. My point is that you cannot > neglect "technical issues" when you solve community problems. Agreed Konstantin. I don't think Chris was being serious here - it was merely *one* way forward. There are, easily, better ways to solve this. The big cross-project dependency is IPC/RPC, Security and Metrics2. Some others are the network topology apis etc. They need to be marked Public/Stable. We need to maintain compatibility across a major (stable) release anyway. This is true for every other Public/Stable api. So, *technically*, the requirements are: a) Ensure projects only use Public/Stable apis. b) Maintain compatibility for Public/Stable apis within a major release. c) Clearly key components like IPC, Metrics2, Secuirty etc. *should* be marked stable by the time the ersatz hadoop-2 codebase is declared 'stable'. None of these seem like the fashionably *scary* technical issues some people are using to justify blocking the way forward. And, no, YARN/MR aren't the only ones downstream projects in this mix - HBase for e.g. uses hadoop metrics2 and our security apis. We need to support compatibility for HBase anyway. There are several other projects in the same boat. Pig/Hive need FileSystem, Security & MR apis. This is just *reality* being at the bottom of the stack. Yes, there is work left - but that work is something we need to do with or without the split. Furthermore, yes, the previous split/unsplit was painful. However, beyond that, we have made progress across several dimensions which should make this one smoother: a) Mavenization has helped a *lot*. b) Unlike the previous attempt, HDFS2 & YARN (v/s HDFS1 & MR1) no longer share the same run-time scripts etc. c) We have been fairly good at following through on our stability/visibility guarantees on APIs. As a result, I don't buy the *this is technically impossible• argument. As Konstantin suggested, we could spend the next few weeks/months preparing. Even after the split we would be in alpha/beta stage where-by we can recover from mistakes at the cost of a few extra HDFS alpha/beta releases for the sake of MR/YARN projects which seems like an acceptable cost given that there are several volunteers to RM releases. Last, not least, the previous split failed because the overall community did not invest in ensuring it's success. It's clearly *not* the case this time around. I'm very confident of that. Arun
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectArun C Murthy 2012-08-30, 12:29
Eli,
On Aug 30, 2012, at 12:02 AM, Eli Collins wrote: > On Wed, Aug 29, 2012 at 11:31 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote: >> >> On Aug 29, 2012, at 10:46 PM, Eli Collins wrote: >> >>> On Wed, Aug 29, 2012 at 4:20 PM, Mattmann, Chris A (388J) >>> <[EMAIL PROTECTED]> wrote: >>>> Arun, great work below. Concrete, and an actual proposal of PMC lists. >>>> >>>> What do folks think? >>> >>> I don't see how it helps. This substantially *increases* the size of >>> the PMC for HDFS, I don't even recognize a bunch of names on this >>> list. Unless we're actually going to try to make the HDFS project >>> represent the people who actually contribute and run the project we're >>> just replicating the current situation across 3 projects. 5+ hdfs >>> patches in the last year seems like a pretty low bar to me. >> >> >> Fine. Could you please provide us with an alternate for consideration? >> > Ok, I'll bite - I find learned helplessness very frustrating. I modified my proposal to keep the current distinction of Committers v/s PMC for all projects i.e. all projects keep the list of committers I had but PMC is restricted to a intersection of current PMC and respective project's committer list: http://wiki.apache.org/hadoop/HDFS_MR_YARN_TLP_Proposal > Todd's list seems more in line with the goal of reducing project > members to reflect the actual community. > I do hope you merely missed my response to Todd's proposal which needs a lot more work: http://s.apache.org/OYK. If you are going to support it, please fix it first. If not, I'll assume you agree with my modifications since they addressed your concerns of *substantially increasing* HDFS PMC size. Arun
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAndrew Purtell 2012-08-30, 13:46
+1
Please don't do this again. On Thu, Aug 30, 2012 at 12:18 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > Have we not learned our lessons from the last attempts to split? > > The issues in our community, which I think Chris is referring to, do > not generally revolve around project boundaries. It's not the case > that the HDFS community wants to go one way and the MR/YARN community > wants to go another, and we get into a conflict around it. If it were, > then splitting into separate TLPs would make a ton of sense. > > Instead, the issues are usually _within_ a component. So, if we split > into 3 TLPs, then we'll just have 3 TLPs, each of which is just as > contentious as before. > > Let's just embrace contention as a fact of life on a high-profile > high-stakes project and get back to work. > > I wasted nearly a month undoing the mess of the last attempt, and I > don't see why this time it would go any better. -1 from my perspective > on splitting again at this point. Perhaps if we get to the point that > we're never making cross-project commits it makes sense, but we're not > there still. > > -Todd > > On Wed, Aug 29, 2012 at 1:40 PM, Alejandro Abdelnur <[EMAIL PROTECTED]> > wrote: > > I volunteer to help cleanup/normalize Maven stuff. > > > > Thx > > > > On Wed, Aug 29, 2012 at 1:34 PM, Tom White <[EMAIL PROTECTED]> wrote: > >> Eric - I agree with Common being included in HDFS. That's what I meant > >> by Common not having a clear enough mission to be a TLP by itself. > >> > >> Arun - I'm happy to RM some of the upcoming MR releases too. Also to > >> help out with the work on audience annotations and compatibility. > >> > >> Cheers, > >> Tom > >> > >> On Wed, Aug 29, 2012 at 7:22 PM, Arun C Murthy <[EMAIL PROTECTED]> > wrote: > >>> On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote: > >>>> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote: > >>>>> > >>>>> Robert and Alejandro have brought up good questions. Here are my > thoughts: > >>>>> - For first one or two releases all the projects can coordinate and > do the > >>>>> releases together. This should help simplify the immediate work > needed. > >>>>> This should also help in us meeting the release timelines that we are > >>>>> working towards. As the split makes progress, this cross project > >>>>> coordination will no longer be necessary. I volunteer to RM these > releases > >>>>> and do the needed co-ordination from HDFS. > >>>> > >>>> > >>>> +1 seems like a reasonable first step. Thanks for volunteering Suresh. > >>> > >>> Also, I'd say we make at least 3-4 alpha/beta releases in this shape. > >>> > >>> I volunteer to RM for MR/YARN releases and work with Suresh. > >>> > >>> Arun > >>> > > > > > > > > -- > > Alejandro > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-30, 13:51
Hi Konstantin,
On Aug 30, 2012, at 3:12 AM, Konstantin Shvachko wrote: > On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: >> OK I lied and said I wouldn't reply :) > > Long thread. I just picked a random Chris's (as the initiator) email to reply. > > Chris, > You are basically saying there's been a history of community problems > in Hadoop project, > and proposing a technical solution to split the project by replicating > the source base under three new names, > implying that this will solve the community problems we (the Hadoop > community) are facing. Well actually the replication of the source code is just a small part of what I was proposing (and one that I don't really care about, and that isn't crucial to what I'm saying). The breakage up of the project into individuals that actually share similar views, that can reach consensus on things (besides technical issues), and that work in the Apache way is what I was really proposing. > > I see several issues. > > 1. There are other ways to split the project. > We essentially have a "natural" split of the project already in place. > Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk > are in a sense competing projects by themselves, with own contributors > and release cycles. +1, that's a great split too. I'm not wed to simply splitting the project along components, or systems or whatever. Whatever makes sense to get communities of people working together at Apache is what I'm after. Community != technical. > > 2. From technical (not community) viewpoint your "svn copy" is an ugly > approach, > [..snip...] +1, totally is ugly -- I used it for illustration in the hope that the Hadoop technical experts could come up with a better one and stop using it as an excuse to fix the community problems. > > 3. I am as skeptical as Todd that the community problems will be > solved by simply TLP-ing the three projects. > Two years ago Hadoop was in crises as vendors were producing their own > releases calling it Hadoop. > I think this was solved, but "poor community behavior" and contentions > remained, embrace them or not. Vendors still produce their own releases on top of Hadoop, whether they call them Hadoop or not. That problem isn't fixed, and won't be fixed -- it's grown too much. > > 4. Having said the above, separating the projects seems reasonable. > (See timing though) > HDFS will inevitable have to inherit and maintain most of Common. > Totally understand frustration of people who just put a huge effort > into merging > the sources back under common root. Me too which is why I'm not urging for this or that, or how to solve these types of things. I'm not sure, but I also know that it's most important to get projects that understand how things work here at Apache. > > 5. Timing is important. > Waiting until Hadoop 2 is stable as Arun suggested earlier would > probably be too long. > Doing it next week, without discussing and solving technical issue > listed in the thread would be premature. > I think Hadoop 0.23.3 release backed by Yahoo production has a > potential to become > the next stable version, letting the project to move ahead off the > four year old code base. > We should help that happen first, and do necessary preparations for > the split in the mean time. Sounds reasonable to me. Thanks for your feedback. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAndrew Purtell 2012-08-30, 14:11
As a direct Apache software product consumer and sometimes contributor, I
also experienced firsthand the pain of the project splits. It was not possible to build an installable release. It may have been many days or weeks before that was cured by a re-merge. I gave up after burning too many hours on it, went back to the 1.0 code base, and came back only after the damage was repaired. It's also frustrating to hear, even if just one person's proposal, that we have spent months preparing to stabilize our next production deployment based on the 2.0 branch, with the expectation that it will be the new stable, but now maybe 0.23 will be the new stable. 0.23 is quite backwards in comparison and missing all of the critical HA HDFS work. This thread seems to be becoming a competition for which is the more radical proposal to snatch defeat from the jaws of success. These proposals seem to be made with a total lack of care for the end user. >From my point of view, things were going reasonably well until suddenly there is this sudden turn into lunacy. I am positive this kind of "foundation" / PMC / project / administrivia tinkering is what will fragment or disband the Hadoop community of users and contributors, not disagreements between committers. A Hadoop competitor couldn't be happer. On Thu, Aug 30, 2012 at 1:12 PM, Konstantin Shvachko <[EMAIL PROTECTED]>wrote: > On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: > > OK I lied and said I wouldn't reply :) > > Long thread. I just picked a random Chris's (as the initiator) email to > reply. > > Chris, > You are basically saying there's been a history of community problems > in Hadoop project, > and proposing a technical solution to split the project by replicating > the source base under three new names, > implying that this will solve the community problems we (the Hadoop > community) are facing. > > I see several issues. > > 1. There are other ways to split the project. > We essentially have a "natural" split of the project already in place. > Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk > are in a sense competing projects by themselves, with own contributors > and release cycles. > > 2. From technical (not community) viewpoint your "svn copy" is an ugly > approach, > as it creates a lot of code duplication and will result in a > maintenance nightmare or / and > will require many man-months to fix. My point is that you cannot > neglect "technical issues" when you solve community problems. > > 3. I am as skeptical as Todd that the community problems will be > solved by simply TLP-ing the three projects. > Two years ago Hadoop was in crises as vendors were producing their own > releases calling it Hadoop. > I think this was solved, but "poor community behavior" and contentions > remained, embrace them or not. > > 4. Having said the above, separating the projects seems reasonable. > (See timing though) > HDFS will inevitable have to inherit and maintain most of Common. > Totally understand frustration of people who just put a huge effort > into merging > the sources back under common root. > > 5. Timing is important. > Waiting until Hadoop 2 is stable as Arun suggested earlier would > probably be too long. > Doing it next week, without discussing and solving technical issue > listed in the thread would be premature. > I think Hadoop 0.23.3 release backed by Yahoo production has a > potential to become > the next stable version, letting the project to move ahead off the > four year old code base. > We should help that happen first, and do necessary preparations for > the split in the mean time. > > Thanks, > --Konstantin > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAaron T. Myers 2012-08-30, 14:23
+1
I could not agree more with everything Andrew has written below. Things have been running really quite smoothly for months (a year?) now. We've had one rather small disagreement, that we're about to have cleared up, and now suddenly we're talking about rearranging the whole thing. I still fail to see how this could serve to help Hadoop. -- Aaron T. Myers Software Engineer, Cloudera On Thu, Aug 30, 2012 at 7:11 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > As a direct Apache software product consumer and sometimes contributor, I > also experienced firsthand the pain of the project splits. It was not > possible to build an installable release. It may have been many days or > weeks before that was cured by a re-merge. I gave up after burning too many > hours on it, went back to the 1.0 code base, and came back only after the > damage was repaired. > > It's also frustrating to hear, even if just one person's proposal, that we > have spent months preparing to stabilize our next production deployment > based on the 2.0 branch, with the expectation that it will be the new > stable, but now maybe 0.23 will be the new stable. 0.23 is quite backwards > in comparison and missing all of the critical HA HDFS work. > > This thread seems to be becoming a competition for which is the more > radical proposal to snatch defeat from the jaws of success. > > These proposals seem to be made with a total lack of care for the end user. > > From my point of view, things were going reasonably well until suddenly > there is this sudden turn into lunacy. I am positive this kind of > "foundation" / PMC / project / administrivia tinkering is what will > fragment or disband the Hadoop community of users and contributors, not > disagreements between committers. A Hadoop competitor couldn't be happer. > > On Thu, Aug 30, 2012 at 1:12 PM, Konstantin Shvachko > <[EMAIL PROTECTED]>wrote: > > > On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J) > > <[EMAIL PROTECTED]> wrote: > > > OK I lied and said I wouldn't reply :) > > > > Long thread. I just picked a random Chris's (as the initiator) email to > > reply. > > > > Chris, > > You are basically saying there's been a history of community problems > > in Hadoop project, > > and proposing a technical solution to split the project by replicating > > the source base under three new names, > > implying that this will solve the community problems we (the Hadoop > > community) are facing. > > > > I see several issues. > > > > 1. There are other ways to split the project. > > We essentially have a "natural" split of the project already in place. > > Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk > > are in a sense competing projects by themselves, with own contributors > > and release cycles. > > > > 2. From technical (not community) viewpoint your "svn copy" is an ugly > > approach, > > as it creates a lot of code duplication and will result in a > > maintenance nightmare or / and > > will require many man-months to fix. My point is that you cannot > > neglect "technical issues" when you solve community problems. > > > > 3. I am as skeptical as Todd that the community problems will be > > solved by simply TLP-ing the three projects. > > Two years ago Hadoop was in crises as vendors were producing their own > > releases calling it Hadoop. > > I think this was solved, but "poor community behavior" and contentions > > remained, embrace them or not. > > > > 4. Having said the above, separating the projects seems reasonable. > > (See timing though) > > HDFS will inevitable have to inherit and maintain most of Common. > > Totally understand frustration of people who just put a huge effort > > into merging > > the sources back under common root. > > > > 5. Timing is important. > > Waiting until Hadoop 2 is stable as Arun suggested earlier would > > probably be too long. > > Doing it next week, without discussing and solving technical issue > > listed in the thread would be premature. > > I think Hadoop 0.23.3 release backed by Yahoo production has a
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectBrock Noland 2012-08-30, 14:43
+1
As an observer, user, and sometimes contributor, I feel as though the project has been going smoothly over the past year. As such, I was quite surprised when this popped up. On Thu, Aug 30, 2012 at 9:23 AM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: > +1 > > I could not agree more with everything Andrew has written below. Things > have been running really quite smoothly for months (a year?) now. We've had > one rather small disagreement, that we're about to have cleared up, and now > suddenly we're talking about rearranging the whole thing. I still fail to > see how this could serve to help Hadoop. >
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectDoug Cutting 2012-08-30, 16:17
On Wed, Aug 29, 2012 at 7:29 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > You're right, it's not project boundaries, it's poor community behavior, > and general umbrella-project-ness. The primary problem I see with umbrellas is that the PMC isn't able to accurately represent the developer community. Hadoop used to have that problem, when HBase, etc. were subprojects and most PMC members were not involved in those subprojects. Currently this is less of a problem. Many PMC members are involved in several different parts of the project and most PMC members follow all the developer mailing lists. Hadoop at present thus has some semblance to an umbrella but is by no means a classic umbrella. > One aspect I've seen is that exclusivity of allowing people to become > PMC members on the project, and the separation of PMC from C. > Other things I've seen are the use of technical justifications or complexity > issues as an excuse for the exclusivity, as an excuse for drawing boundaries > between project committers and PMC members, and then between specific > products that the project and community as a whole releases, and finally > other things I've seen include external interests influencing the way that > business is done around here (need for releases in downstream companies, > or projects driving upstream, Apache decisions, which are supposed to be > independent of any lone company, or set of companies -- it's individuals here > that do the work). I am unconvinced that splitting Hadoop into three projects is a panacea for these issues. For example, adding committers to the sub-lists has been contentious even among the members of those sublists. Splitting is perhaps a better long-term structure for the project. But it should be done slowly and carefully. Moving too quickly could cause a lot of extra work for a lot of people, both in the project and downstream. A series of incremental steps should prove less painful. For example, the YARN developers might propose that they fork to a new TLP. The YARN code code could then be removed from the mother project's trunk but remain in branches for compatible bugfix releases. Downstream projects could start adding a dependency on the YARN project once it makes releases. Doug
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectInder.dev Java 2012-08-30, 16:33
HI Hadoopers,
I am a user and big fan for Hadoop. I can see lot of great discussions here. Most of them talks about the access rights and technical problem in split. I don't see many members contributing to projects, but they have access looks like. If they are not contributing, why access required? Generally I will watch the mails in community. Some people not even published and i can see their names in list you have proposed. https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12310942&statistictype=assignees&selectedProjectId=12310942&reportKey=com.atlassian.jira.plugin.system.reports%3Apie-report&Next=Next https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12310941&statistictype=assignees&selectedProjectId=12310942&reportKey=com.atlassian.jira.plugin.system.reports%3Apie-report&Next=Next I am curious to know how that many people got access in Map Reduce/HDFS. when you think they are separate, Todd proposed list is more close to the above links and looks to be true contributors. Take the correct information rather than messing up... I too think that, it is good to split and have a lists like Todd proposed and take the list for YARN from Arun. This is just my thought. I may not know many things here. Please ignore this mail, if I misunderstood some things in community. -thx On Thu, Aug 30, 2012 at 7:22 AM, Arun C Murthy <[EMAIL PROTECTED]> wrote: > > On Aug 29, 2012, at 4:48 PM, Todd Lipcon wrote: > > > On Wed, Aug 29, 2012 at 4:47 PM, Konstantin Boudnik <[EMAIL PROTECTED]> > wrote: > >> I am curious where the arbitrar numbery 5 is coming from: is it > reflected in > >> the bylaws? > > > > Nope, I picked it based on Arun's earlier picking of the same number > > in the YARN thread. We have no bylaws about what would happen in the > > eventual TLP-ification of subcomponents, of course. > > I'm sure you just missed it - but, I want to set the record straight: I > picked 20+ patch contributions or 10+ review/commits since *project > inception*. > Your pick seems to be just commits in last 12 months. I have put forth > one, please put forth another proposal if you like. However, please, do > include patches, not just commits. > > For e.g. I'd propose we add llu@ for HDFS since he's done a ton of work > on metrics2 recently. My bad for missing that initially - apologies Luke. I > might have missed more, pls ping me or add yourself. I've put my proposal > up on http://wiki.apache.org/hadoop/HDFS_MR_YARN_TLP_Proposal. > > We could also revisit issues like emeritus after the split to allow each > project to figure it's own norms - I'd urge for that option. > > thanks, > Arun > >
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectDoug Cutting 2012-08-30, 17:00
On Thu, Aug 30, 2012 at 12:33 PM, Inder.dev Java <[EMAIL PROTECTED]> wrote:
> I am curious to know how that many people got access in Map Reduce/HDFS. Many of these are folks who were more active in the past. Hadoop is now 6.5 years old. At Apache, merit does not expire: http://www.apache.org/dev/committers.html#committer-set-term Doug
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectOwen O'Malley 2012-08-30, 18:25
On Thu, Aug 30, 2012 at 10:00 AM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> At Apache, merit does not expire: > Agreed. The contributions from the beginning of the project need to be considered. Clearly YARN was heavily influenced by and borrowed heavily from MapReduce. The fact that a NodeManager isn't the same name as a TaskTracker doesn't mean it doesn't do the same things using some of the same code. Based on that, I'd propose that the MapReduce committer list be cloned as the Yarn committer list too. -- Owen
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectChris Douglas 2012-08-31, 01:24
+1 for splitting the projects
+1 for adding all MR contributors to Yarn I may have missed its mention in this thread, but maintaining the 1.x branch is probably the most awkward technical hurdle. I'm not sure how that should be managed if the projects are split. In one strategy, it can be left with Common+HDFS until 2.x stabilizes. The tasks that are simpler in a unified project- releases, cross-project patches, etc- are relatively rare, but all dev has paid a tax. That acknowledged, as Arun points out: the half-measures that have made the split painful can be fixed and enthusiasm/resources appear to be available for that. As long as TLPs are reconciled quickly and decisively, this can be successful. Without dedicated resources, we can expect the same result as before. As for what this accomplishes: each subproject is more approachable on its own. I don't think it will alleviate political tensions, neither are such tensions inherently unhealthy. But a split can limit the scope to the particular subproject and its interests. It's also easier for collaborators to engage the subset of contributors charged with its roadmap: Pig/Hive should be able to wrangle MapReduce and Yarn folks on their dev list, as HBase should engage HDFS without importing extra context. As another practical matter: we should change the bylaws so emeritus PMC members/committers can reinstate themselves without a vote. I expect many people, including myself, would have no problem signaling periods of inactivity if project politics were out of the equation. -C On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) <[EMAIL PROTECTED]> wrote: > [decided to minimize traffic and to simply put this in one thread] > > Hi Guys, > > See the recent discussion on these threads: > > YARN as its own Hadoop "sub project": http://s.apache.org/WW1 > Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx > > ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating > as a single project, that's masking separate communities that themselves are really > separate ASF projects. > > At the ASF, this has been a problem area called "umbrella" projects and over the years, > all I've seen from them is wasted bandwidth, artificial barriers and the inventions of > new ways to perform process mongering and to reduce the fun in developing software > at this fantastic foundation. > > I've talked about umbrella projects enough. We've diverted conversation enough. > Enough people have tried to act like there is some technical mumbo jumbo that is > preventing the eventual act of higher power that I myself hope comes should these > discussions prove unfruitful through normal means. > > *these. are. separate. projects.* > *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* > > In this email: http://s.apache.org/rSm > > And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy > through below for splitting these projects into their own TLPs: > > -----snip > Process: > > 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. > > 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've > already discussed. > > 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus > can be reached (just a thought experiment). VOTE if necessary. > > 3. [VOTE] thread for <TLP name> > > 4. Create Project: > a. paste resolution from #0 to board@ or; > b. go to general@incubator and start new Incubator project. > > 5. infrastructure set up. > MLs moving; new UNIX groups; website setup; > SVN setup like this: > > svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool MR name>; or > svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool YARN name>; or
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectDevaraj Das 2012-08-31, 01:28
Andrew's points are fair IMHO. In general, I think it makes sense to have the TLPs but we aren't there yet (as others have pointed out). I'd propose that we should think about the timelines (maybe an appropriate time is when we have Hadoop-2.0 GA'ed).
On Aug 30, 2012, at 7:11 AM, Andrew Purtell wrote: > As a direct Apache software product consumer and sometimes contributor, I > also experienced firsthand the pain of the project splits. It was not > possible to build an installable release. It may have been many days or > weeks before that was cured by a re-merge. I gave up after burning too many > hours on it, went back to the 1.0 code base, and came back only after the > damage was repaired. > > It's also frustrating to hear, even if just one person's proposal, that we > have spent months preparing to stabilize our next production deployment > based on the 2.0 branch, with the expectation that it will be the new > stable, but now maybe 0.23 will be the new stable. 0.23 is quite backwards > in comparison and missing all of the critical HA HDFS work. > > This thread seems to be becoming a competition for which is the more > radical proposal to snatch defeat from the jaws of success. > > These proposals seem to be made with a total lack of care for the end user. > > From my point of view, things were going reasonably well until suddenly > there is this sudden turn into lunacy. I am positive this kind of > "foundation" / PMC / project / administrivia tinkering is what will > fragment or disband the Hadoop community of users and contributors, not > disagreements between committers. A Hadoop competitor couldn't be happer. > > On Thu, Aug 30, 2012 at 1:12 PM, Konstantin Shvachko > <[EMAIL PROTECTED]>wrote: > >> On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J) >> <[EMAIL PROTECTED]> wrote: >>> OK I lied and said I wouldn't reply :) >> >> Long thread. I just picked a random Chris's (as the initiator) email to >> reply. >> >> Chris, >> You are basically saying there's been a history of community problems >> in Hadoop project, >> and proposing a technical solution to split the project by replicating >> the source base under three new names, >> implying that this will solve the community problems we (the Hadoop >> community) are facing. >> >> I see several issues. >> >> 1. There are other ways to split the project. >> We essentially have a "natural" split of the project already in place. >> Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk >> are in a sense competing projects by themselves, with own contributors >> and release cycles. >> >> 2. From technical (not community) viewpoint your "svn copy" is an ugly >> approach, >> as it creates a lot of code duplication and will result in a >> maintenance nightmare or / and >> will require many man-months to fix. My point is that you cannot >> neglect "technical issues" when you solve community problems. >> >> 3. I am as skeptical as Todd that the community problems will be >> solved by simply TLP-ing the three projects. >> Two years ago Hadoop was in crises as vendors were producing their own >> releases calling it Hadoop. >> I think this was solved, but "poor community behavior" and contentions >> remained, embrace them or not. >> >> 4. Having said the above, separating the projects seems reasonable. >> (See timing though) >> HDFS will inevitable have to inherit and maintain most of Common. >> Totally understand frustration of people who just put a huge effort >> into merging >> the sources back under common root. >> >> 5. Timing is important. >> Waiting until Hadoop 2 is stable as Arun suggested earlier would >> probably be too long. >> Doing it next week, without discussing and solving technical issue >> listed in the thread would be premature. >> I think Hadoop 0.23.3 release backed by Yahoo production has a >> potential to become >> the next stable version, letting the project to move ahead off the >> four year old code base. >> We should help that happen first, and do necessary preparations for
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectVinod Kumar Vavilapalli 2012-08-31, 03:35
+1 for splitting the projects.
On Aug 30, 2012, at 6:24 PM, Chris Douglas wrote: > The tasks that are simpler in a unified project- releases, > cross-project patches, etc- are relatively rare, but all dev has paid > a tax. Agreed. > That acknowledged, as Arun points out: the half-measures that > have made the split painful can be fixed and enthusiasm/resources > appear to be available for that. As long as TLPs are reconciled > quickly and decisively, this can be successful. Without dedicated > resources, we can expect the same result as before. I am willing to volunteer myself to help with whatever it takes to accomplish this. Even the last time around, I did my bit to make the split happen, the fruits of which aren't all lost today. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAndrew Purtell 2012-08-31, 06:02
Looking at the voting, it appears YARN wants to become a TLP RIGHT NOW but
at the price of the complete decoherence of the Apache Hadoop platform. For all of us who have invested in the Apache Hadoop platform, how does this benefit us? Certainly our interests seem to get little consideration with this plan to just blow everything up tomorrow. How does a downstream project that imports HDFS and MapReduce coordinate the shared dependencies with those new projects? For, example Guava. One could have a multi way library incompatibility problem; this has already happened in the large with HDFS, HBase, and Pig. It's DLL hell magnified 3 or 4 times just in the smoking ruins of "core". The obvious answer is: Once these pieces are moving in different trajectories at different rates, end users and downstream projects will be forced to negotiate with many parties, and those parties explicitly wont care about the issues concerning another, according to this discussion. YARN must have broken our minicluster based MapReduce tests 5 times over the last year. HDFS took up a certain version of Guava and this required us to refactor some code to match that version. We had a coherent group of committers to assist us then but that would go away. Proponents of the split seem to want exactly this situation. BigTop was suggested as a vehicle for addressing that concern but then explicitly rejected on this thread. A commercial vendor looking to torpedo the ability of anyone to build something on Apache Hadoop directly couldn't come up with a better plan, because only a full time operation can be expected to have the resources to harmonize the pieces plus all of their dependencies with build patches, code wrangling, testing, testing, testing. Volunteer contributor and committer time is a precious gift. I wonder if the many professional full time Hadoop devs voting here have lost sight of this. Pushing your integration work downstream doesn't mean resources will be there to pick it up. Downstream projects could be forced to reluctantly abandon working with Apache releases for a commercial distribution such as CDH, or the MapR platform. Or, they will be unable to move from a "known good" combination in the face of a combinatorial explosion of dependency changes, so their general utility to the end user steadily declines. Maybe the consensus is that is acceptable, but I would find that kind of a sad ending to this remarkable project. On Friday, August 31, 2012, Devaraj Das wrote: > Andrew's points are fair IMHO. In general, I think it makes sense to have > the TLPs but we aren't there yet (as others have pointed out). I'd propose > that we should think about the timelines (maybe an appropriate time is when > we have Hadoop-2.0 GA'ed). > > On Aug 30, 2012, at 7:11 AM, Andrew Purtell wrote: > > > As a direct Apache software product consumer and sometimes contributor, I > > also experienced firsthand the pain of the project splits. It was not > > possible to build an installable release. It may have been many days or > > weeks before that was cured by a re-merge. I gave up after burning too > many > > hours on it, went back to the 1.0 code base, and came back only after the > > damage was repaired. > > > > It's also frustrating to hear, even if just one person's proposal, that > we > > have spent months preparing to stabilize our next production deployment > > based on the 2.0 branch, with the expectation that it will be the new > > stable, but now maybe 0.23 will be the new stable. 0.23 is quite > backwards > > in comparison and missing all of the critical HA HDFS work. > > > > This thread seems to be becoming a competition for which is the more > > radical proposal to snatch defeat from the jaws of success. > > > > These proposals seem to be made with a total lack of care for the end > user. > > > > From my point of view, things were going reasonably well until suddenly > > there is this sudden turn into lunacy. I am positive this kind of > > "foundation" / PMC / project / administrivia tinkering is what will Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-31, 06:15
Hi Andrew,
How many new Apache Foundation *members* has the Hadoop PMC added over the past 3-4 years, and by whom (the answer to this question might surprise you)? The thing you and others continue not to see is that the ASF isn't about the most superior technical solutions, or the best refactorings to prevent Google Guava dependencies, the ASF is about *community* _over_ *code*. Period. The metrics that the Foundation and its members are interested in are the metrics that demonstrate the health of the project. Technical prowess and market-share are great, as are diverse, hungry, downstream user communities. But the ASF is here to create communities, communities that work together to develop code for public good at no charge to the public. Scope out Board resolutions to create projects and read the repetitive text in them -- there's a pattern there that elucidates this. Also, the project members and community members here could slice and dice the project into 50 different Top Level Projects, but it doesn't mean that Hadoop would be at its "ending". Cheers, Chris On Aug 30, 2012, at 11:02 PM, Andrew Purtell wrote: > Looking at the voting, it appears YARN wants to become a TLP RIGHT NOW but > at the price of the complete decoherence of the Apache Hadoop platform. For > all of us who have invested in the Apache Hadoop platform, how does this > benefit us? Certainly our interests seem to get little consideration with > this plan to just blow everything up tomorrow. > > How does a downstream project that imports HDFS and MapReduce coordinate > the shared dependencies with those new projects? For, example Guava. One > could have a multi way library incompatibility problem; this has already > happened in the large with HDFS, HBase, and Pig. It's DLL hell magnified 3 > or 4 times just in the smoking ruins of "core". The obvious answer is: Once > these pieces are moving in different trajectories at different rates, end > users and downstream projects will be forced to negotiate with many > parties, and those parties explicitly wont care about the issues concerning > another, according to this discussion. YARN must have broken our > minicluster based MapReduce tests 5 times over the last year. HDFS took up > a certain version of Guava and this required us to refactor some code to > match that version. We had a coherent group of committers to assist us then > but that would go away. Proponents of the split seem to want exactly this > situation. BigTop was suggested as a vehicle for addressing that concern > but then explicitly rejected on this thread. A commercial vendor looking to > torpedo the ability of anyone to build something on Apache Hadoop directly > couldn't come up with a better plan, because only a full time operation can > be expected to have the resources to harmonize the pieces plus all of their > dependencies with build patches, code wrangling, testing, testing, testing. > Volunteer contributor and committer time is a precious gift. I wonder if > the many professional full time Hadoop devs voting here have lost sight of > this. Pushing your integration work downstream doesn't mean resources will > be there to pick it up. Downstream projects could be forced to reluctantly > abandon working with Apache releases for a commercial distribution such as > CDH, or the MapR platform. Or, they will be unable to move from a "known > good" combination in the face of a combinatorial explosion of dependency > changes, so their general utility to the end user steadily declines. Maybe > the consensus is that is acceptable, but I would find that kind of a sad > ending to this remarkable project. > > On Friday, August 31, 2012, Devaraj Das wrote: > >> Andrew's points are fair IMHO. In general, I think it makes sense to have >> the TLPs but we aren't there yet (as others have pointed out). I'd propose >> that we should think about the timelines (maybe an appropriate time is when >> we have Hadoop-2.0 GA'ed). >> >> On Aug 30, 2012, at 7:11 AM, Andrew Purtell wrote: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-31, 06:36
One quick fix to the below, sorry for the confusion:
On Aug 30, 2012, at 11:15 PM, Mattmann, Chris A (388J) wrote: > Hi Andrew, > > How many new Apache Foundation *members* has the Hadoop PMC added over the past > 3-4 years, and by whom (the answer to this question might surprise you)? To rephrase the above: How many members of the Apache Hadoop PMC have been elected as members of the Apache Software Foundation in the past 3-4 years? (is what I meant to say). For reference, the Apache Software Foundation membership is elected by the existing membership [1] at annual members meetings. However, input into membership and nominations is typically provided by ASF members who are (we hope) parts of those Apache communities (existing PMC members that are also ASF members; or other ASF members that care also, watch, but who themselves are not on the project's PMC). Successfully and healthy ASF projects typically add members to the Foundation's ranks through the standard Foundation processes. Cheers, Chris [1] http://apache.org/foundation/members.html ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAndrew Purtell 2012-08-31, 06:42
If Apache Hadoop -- as an umbrella or sum of its parts -- isn't practical
to develop end applications or downstream projects on, the community will disappear. I don't follow your logic. I deal with the technical realities of actually trying to use an Apache Hadoop distribution, the pieces released as source from the ASF, directly in production, and your position is dismissive if not hostile to my concerns as an end user. What "community" do you mean then? Vendors? Academics? People who like to tinker with things they can't actually use? And you can't just hand waive that this will all work out if done RIGHT NOW, especially with something as inelegant as a SVN copy. On Friday, August 31, 2012, Mattmann, Chris A (388J) wrote: > Hi Andrew, > > How many new Apache Foundation *members* has the Hadoop PMC added over the > past > 3-4 years, and by whom (the answer to this question might surprise you)? > > The thing you and others continue not to see is that the ASF isn't about > the > most superior technical solutions, or the best refactorings to prevent > Google Guava > dependencies, the ASF is about *community* _over_ *code*. > > Period. The metrics that the Foundation and its members are interested in > are > the metrics that demonstrate the health of the project. Technical prowess > and > market-share are great, as are diverse, hungry, downstream user > communities. > But the ASF is here to create communities, communities that work together > to > develop code for public good at no charge to the public. Scope out Board > resolutions to create projects and read the repetitive text in them -- > there's a > pattern there that elucidates this. > > Also, the project members and community members here could slice and > dice the project into 50 different Top Level Projects, but it doesn't mean > that > Hadoop would be at its "ending". > > Cheers, > Chris > > > On Aug 30, 2012, at 11:02 PM, Andrew Purtell wrote: > > > Looking at the voting, it appears YARN wants to become a TLP RIGHT NOW > but > > at the price of the complete decoherence of the Apache Hadoop platform. > For > > all of us who have invested in the Apache Hadoop platform, how does this > > benefit us? Certainly our interests seem to get little consideration with > > this plan to just blow everything up tomorrow. > > > > How does a downstream project that imports HDFS and MapReduce coordinate > > the shared dependencies with those new projects? For, example Guava. One > > could have a multi way library incompatibility problem; this has already > > happened in the large with HDFS, HBase, and Pig. It's DLL hell magnified > 3 > > or 4 times just in the smoking ruins of "core". The obvious answer is: > Once > > these pieces are moving in different trajectories at different rates, end > > users and downstream projects will be forced to negotiate with many > > parties, and those parties explicitly wont care about the issues > concerning > > another, according to this discussion. YARN must have broken our > > minicluster based MapReduce tests 5 times over the last year. HDFS took > up > > a certain version of Guava and this required us to refactor some code to > > match that version. We had a coherent group of committers to assist us > then > > but that would go away. Proponents of the split seem to want exactly this > > situation. BigTop was suggested as a vehicle for addressing that concern > > but then explicitly rejected on this thread. A commercial vendor looking > to > > torpedo the ability of anyone to build something on Apache Hadoop > directly > > couldn't come up with a better plan, because only a full time operation > can > > be expected to have the resources to harmonize the pieces plus all of > their > > dependencies with build patches, code wrangling, testing, testing, > testing. > > Volunteer contributor and committer time is a precious gift. I wonder if > > the many professional full time Hadoop devs voting here have lost sight > of > > this. Pushing your integration work downstream doesn't mean resources Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-31, 06:50
Hi Andrew,
On Aug 30, 2012, at 11:42 PM, Andrew Purtell wrote: > If Apache Hadoop -- as an umbrella or sum of its parts -- isn't practical > to develop end applications or downstream projects on, the community will > disappear. Sure, the end-user community might disappear, but the point I'm trying to make is that the community is more than that. It's developers that build code together ("community over code"); it's folks who write documentation who are part of the project's committee of folks working together to develop software for the public good at this Foundation. It's folks who write unit tests as part of that. It's also people that fly by on the lists and that need help; or that may throw up a patch, or whatever. It's other members of the Apache Software Foundation that are charged with caring and giving a rip about the Foundation's projects. It's also downstream users of the software too -- they just aren't the only folks who are the community, that's all. > I don't follow your logic. I deal with the technical realities > of actually trying to use an Apache Hadoop distribution, the pieces > released as source from the ASF, directly in production, and your position > is dismissive if not hostile to my concerns as an end user. Sorry I wasn't trying to be dismissive. But at the same time I want to suggest that the community is broader than simply the technical folks who use the project. > What > "community" do you mean then? Vendors? Academics? People who like to tinker > with things they can't actually use? Yeah the community I'm talking about is the larger whole that makes up the community of the project. > > And you can't just hand waive that this will all work out if done RIGHT > NOW, especially with something as inelegant as a SVN copy. Well the project's health is something that ought to be fixed, and it ought to be done under a timeline. *right now* isn't probably going to be a reality. But I am doing my job as a member of the Foundation in helping to discuss, further root out, and educate the folks around here as to the way that projects work at the Foundation. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAndrew Purtell 2012-08-31, 07:55
The end user community might disappear, and you are ok with this? I'm
simply astonished. Who are these people showing up to help, document, be on lists, whatever, if not current or prospective end users? Who the hell shows up to write unit tests? Who is this "public" in public good? Looks to me like a small cabal of commercial concerns in this case. I guess the only thing we are going to agree on is that confidence in Apache Hadoop project stewardship at the ASF isn't currently warranted. And here I thought things were going so well. Who knew this torpedo lurked beneath the waters. I guess just members of the cabal. There's nothing more for me to say, just maybe a few hard decisions to make, depending how this turns out. On Friday, August 31, 2012, Mattmann, Chris A (388J) wrote: > Hi Andrew, > > On Aug 30, 2012, at 11:42 PM, Andrew Purtell wrote: > > > If Apache Hadoop -- as an umbrella or sum of its parts -- isn't practical > > to develop end applications or downstream projects on, the community will > > disappear. > > Sure, the end-user community might disappear, but the point I'm trying to > make is > that the community is more than that. It's developers that build code > together > ("community over code"); it's folks who write documentation who are part > of the > project's committee of folks working together to develop software for the > public > good at this Foundation. It's folks who write unit tests as part of that. > It's also people > that fly by on the lists and that need help; or that may throw up a patch, > or > whatever. It's other members of the Apache Software Foundation that are > charged with caring and giving a rip about the Foundation's projects. > > It's also downstream users of the software too -- they just aren't the > only folks who > are the community, that's all. > > > I don't follow your logic. I deal with the technical realities > > of actually trying to use an Apache Hadoop distribution, the pieces > > released as source from the ASF, directly in production, and your > position > > is dismissive if not hostile to my concerns as an end user. > > Sorry I wasn't trying to be dismissive. But at the same time I want to > suggest that > the community is broader than simply the technical folks who use the > project. > > > What > > "community" do you mean then? Vendors? Academics? People who like to > tinker > > with things they can't actually use? > > Yeah the community I'm talking about is the larger whole that makes up > the community of the project. > > > > > And you can't just hand waive that this will all work out if done RIGHT > > NOW, especially with something as inelegant as a SVN copy. > > Well the project's health is something that ought to be fixed, and it ought > to be done under a timeline. *right now* isn't probably going to be a > reality. > But I am doing my job as a member of the Foundation in helping to discuss, > further root out, and educate the folks around here as to the way that > projects > work at the Foundation. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] <javascript:;> > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectSteve Loughran 2012-08-31, 11:54
On 29 August 2012 21:34, Tom White <[EMAIL PROTECTED]> wrote:
> Eric - I agree with Common being included in HDFS. That's what I meant > by Common not having a clear enough mission to be a TLP by itself. > That makes sense too. Even better if you could do JIRAs/commits to the same codebase together. > > Arun - I'm happy to RM some of the upcoming MR releases too. Also to > help out with the work on audience annotations and compatibility. > > Cheers, > Tom > > On Wed, Aug 29, 2012 at 7:22 PM, Arun C Murthy <[EMAIL PROTECTED]> > wrote: > > On Aug 29, 2012, at 10:04 AM, Arun C Murthy wrote: > >> On Aug 29, 2012, at 10:02 AM, Suresh Srinivas wrote: > >>> > >>> Robert and Alejandro have brought up good questions. Here are my > thoughts: > >>> - For first one or two releases all the projects can coordinate and do > the > >>> releases together. This should help simplify the immediate work needed. > >>> This should also help in us meeting the release timelines that we are > >>> working towards. As the split makes progress, this cross project > >>> coordination will no longer be necessary. I volunteer to RM these > releases > >>> and do the needed co-ordination from HDFS. > >> > >> > >> +1 seems like a reasonable first step. Thanks for volunteering Suresh. > > > > Also, I'd say we make at least 3-4 alpha/beta releases in this shape. > > > > I volunteer to RM for MR/YARN releases and work with Suresh. > > > > Arun > > >
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectRobert Evans 2012-08-31, 14:34
Andrew,
I agree with you that the DLL/CLASSPATH issues is one huge concern that needs to be addressed before we can really move forward with a valid longterm split. There is hope on the horizon for that though with some of the OSGI work that Tom White has been doing. Chris, I completely agree with Andrew here. There are very *REAL* technical issues that need to be addressed before a *CLEAN* split can happen. We can make a messy one, but the ramifications are far from trivial. If we simply go in blindly it will at a minimum take months to stabilize and get back to where we are now. You may be OK with that, but many of us are not. Simply dismissing others' concurs as invalid is not good for the community. Many of us, as indaviduals, have a huge vested interest in having a stable version of Hadoop with new features in it regularly released. That is why we are part of this community. It frankly baffles me that "community over code" can be used to dismiss concurs about an issue that many of us see as something that will hurt the community. I am +1 for the split, and I am +1 for doing it soon, but I am -1 on doing it without at least having a plan as to how we will tease apart the different pieces of Hadoop. --Bobby On 8/31/12 2:55 AM, "Andrew Purtell" <[EMAIL PROTECTED]> wrote: >The end user community might disappear, and you are ok with this? I'm >simply astonished. Who are these people showing up to help, document, be >on >lists, whatever, if not current or prospective end users? Who the hell >shows up to write unit tests? Who is this "public" in public good? Looks >to >me like a small cabal of commercial concerns in this case. > >I guess the only thing we are going to agree on is that confidence in >Apache Hadoop project stewardship at the ASF isn't currently warranted. >And >here I thought things were going so well. Who knew this torpedo lurked >beneath the waters. I guess just members of the cabal. There's nothing >more >for me to say, just maybe a few hard decisions to make, depending how this >turns out. > >On Friday, August 31, 2012, Mattmann, Chris A (388J) wrote: > >> Hi Andrew, >> >> On Aug 30, 2012, at 11:42 PM, Andrew Purtell wrote: >> >> > If Apache Hadoop -- as an umbrella or sum of its parts -- isn't >>practical >> > to develop end applications or downstream projects on, the community >>will >> > disappear. >> >> Sure, the end-user community might disappear, but the point I'm trying >>to >> make is >> that the community is more than that. It's developers that build code >> together >> ("community over code"); it's folks who write documentation who are part >> of the >> project's committee of folks working together to develop software for >>the >> public >> good at this Foundation. It's folks who write unit tests as part of >>that. >> It's also people >> that fly by on the lists and that need help; or that may throw up a >>patch, >> or >> whatever. It's other members of the Apache Software Foundation that are >> charged with caring and giving a rip about the Foundation's projects. >> >> It's also downstream users of the software too -- they just aren't the >> only folks who >> are the community, that's all. >> >> > I don't follow your logic. I deal with the technical realities >> > of actually trying to use an Apache Hadoop distribution, the pieces >> > released as source from the ASF, directly in production, and your >> position >> > is dismissive if not hostile to my concerns as an end user. >> >> Sorry I wasn't trying to be dismissive. But at the same time I want to >> suggest that >> the community is broader than simply the technical folks who use the >> project. >> >> > What >> > "community" do you mean then? Vendors? Academics? People who like to >> tinker >> > with things they can't actually use? >> >> Yeah the community I'm talking about is the larger whole that makes up >> the community of the project. >> >> > >> > And you can't just hand waive that this will all work out if done >>RIGHT >
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMahadev Konar 2012-08-31, 15:05
I agree with Bobby and Andrew here. As has been said on this thread, I
think the technical issues should be addressed. Just going ahead and doing the split will be counter productive. I am all for the project going TLP's (sooner than later) but I think we need to work through a plan on when/how that addresses the issues brought up by folks in the thread. thanks mahadev On Fri, Aug 31, 2012 at 7:34 AM, Robert Evans <[EMAIL PROTECTED]> wrote: > Andrew, > > I agree with you that the DLL/CLASSPATH issues is one huge concern that > needs to be addressed before we can really move forward with a valid > longterm split. There is hope on the horizon for that though with some of > the OSGI work that Tom White has been doing. > > Chris, > > I completely agree with Andrew here. There are very *REAL* technical > issues that need to be addressed before a *CLEAN* split can happen. We > can make a messy one, but the ramifications are far from trivial. If we > simply go in blindly it will at a minimum take months to stabilize and get > back to where we are now. You may be OK with that, but many of us are > not. Simply dismissing others' concurs as invalid is not good for the > community. Many of us, as indaviduals, have a huge vested interest in > having a stable version of Hadoop with new features in it regularly > released. That is why we are part of this community. It frankly baffles > me that "community over code" can be used to dismiss concurs about an > issue that many of us see as something that will hurt the community. I am > +1 for the split, and I am +1 for doing it soon, but I am -1 on doing it > without at least having a plan as to how we will tease apart the different > pieces of Hadoop. > > --Bobby > > On 8/31/12 2:55 AM, "Andrew Purtell" <[EMAIL PROTECTED]> wrote: > >>The end user community might disappear, and you are ok with this? I'm >>simply astonished. Who are these people showing up to help, document, be >>on >>lists, whatever, if not current or prospective end users? Who the hell >>shows up to write unit tests? Who is this "public" in public good? Looks >>to >>me like a small cabal of commercial concerns in this case. >> >>I guess the only thing we are going to agree on is that confidence in >>Apache Hadoop project stewardship at the ASF isn't currently warranted. >>And >>here I thought things were going so well. Who knew this torpedo lurked >>beneath the waters. I guess just members of the cabal. There's nothing >>more >>for me to say, just maybe a few hard decisions to make, depending how this >>turns out. >> >>On Friday, August 31, 2012, Mattmann, Chris A (388J) wrote: >> >>> Hi Andrew, >>> >>> On Aug 30, 2012, at 11:42 PM, Andrew Purtell wrote: >>> >>> > If Apache Hadoop -- as an umbrella or sum of its parts -- isn't >>>practical >>> > to develop end applications or downstream projects on, the community >>>will >>> > disappear. >>> >>> Sure, the end-user community might disappear, but the point I'm trying >>>to >>> make is >>> that the community is more than that. It's developers that build code >>> together >>> ("community over code"); it's folks who write documentation who are part >>> of the >>> project's committee of folks working together to develop software for >>>the >>> public >>> good at this Foundation. It's folks who write unit tests as part of >>>that. >>> It's also people >>> that fly by on the lists and that need help; or that may throw up a >>>patch, >>> or >>> whatever. It's other members of the Apache Software Foundation that are >>> charged with caring and giving a rip about the Foundation's projects. >>> >>> It's also downstream users of the software too -- they just aren't the >>> only folks who >>> are the community, that's all. >>> >>> > I don't follow your logic. I deal with the technical realities >>> > of actually trying to use an Apache Hadoop distribution, the pieces >>> > released as source from the ASF, directly in production, and your >>> position >>> > is dismissive if not hostile to my concerns as an end user.
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-31, 15:09
Hi Bobby, and Andrew,
Sorry I think both of you are still missing my point (maybe I'm wrong). And sorry that I've failed to explain it in such a way that you guys understand, that's as much my issue as anyone else's. My point is - technical issues, such as how to pull apart components and modules are difficult, and my svn copy suggestion, and moreover, my overall suggestion to figure out how to split the umbrella project of Hadoop up had less to do with technically pulling apart any its software product components than it did with actually suggesting a split in the members of the project management committee of the Apache Hadoop project. The svn copy I suggested was merely to provide said new committees with code to work from (the same code base they have now in fact). Put simply: I think you guys know a whole lot better about how to deliver your software product to the community than I do. So I'm not even trying to say that I know what the ins and outs of splitting MR, YARN and HDFS entail, nor am I even trying to say "hey you HAVE to do that part". That's the technical part. I am saying that the current members of the Apache Software Foundation's Hadoop Project Management Committee exhibit the characteristics (not just during discrete events; it's been happening for a long time) of folks who in reality shouldn't belong to the same project management committee. Note: this is NOT a bad thing. There are probably plenty of (sub-)sets of groups at Apache and elsewhere that folks wouldn't fit in to. I've enumerated some of those characteristics that you can see sometimes spill over (meta thought discussions about moving things around; or drawing arbitrary lines around pieces of code that really have nothing to do with technical stuff, and more to do about insulating and control;), but there are also other concerns such as frameworks put in to place (exclusivity amongst others) that themselves are pretty high indicators that this is an umbrella project. There are social memes *around* code, that certainly have an impact on the code, but are not the code themselves. *That* is what I am talking about. If the code splits or whatever make sense as part of the internal navel gazing I'm suggesting regarding the *committee* of this project, then so be it. However, I have no direct say in any of that ( nor would I expect to without having the merit in the code to have a say). Hope that helps explain where I was coming from better. Cheers, Chris On Aug 31, 2012, at 7:34 AM, Robert Evans wrote: > Andrew, > > I agree with you that the DLL/CLASSPATH issues is one huge concern that > needs to be addressed before we can really move forward with a valid > longterm split. There is hope on the horizon for that though with some of > the OSGI work that Tom White has been doing. > > Chris, > > I completely agree with Andrew here. There are very *REAL* technical > issues that need to be addressed before a *CLEAN* split can happen. We > can make a messy one, but the ramifications are far from trivial. If we > simply go in blindly it will at a minimum take months to stabilize and get > back to where we are now. You may be OK with that, but many of us are > not. Simply dismissing others' concurs as invalid is not good for the > community. Many of us, as indaviduals, have a huge vested interest in > having a stable version of Hadoop with new features in it regularly > released. That is why we are part of this community. It frankly baffles > me that "community over code" can be used to dismiss concurs about an > issue that many of us see as something that will hurt the community. I am > +1 for the split, and I am +1 for doing it soon, but I am -1 on doing it > without at least having a plan as to how we will tease apart the different > pieces of Hadoop. > > --Bobby > > On 8/31/12 2:55 AM, "Andrew Purtell" <[EMAIL PROTECTED]> wrote: > >> The end user community might disappear, and you are ok with this? I'm >> simply astonished. Who are these people showing up to help, document, be ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectRoman Shaposhnik 2012-08-31, 15:59
On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > [decided to minimize traffic and to simply put this in one thread] > > Hi Guys, > > See the recent discussion on these threads: > > YARN as its own Hadoop "sub project": http://s.apache.org/WW1 > Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx > > ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating > as a single project, that's masking separate communities that themselves are really > separate ASF projects. > > At the ASF, this has been a problem area called "umbrella" projects and over the years, > all I've seen from them is wasted bandwidth, artificial barriers and the inventions of > new ways to perform process mongering and to reduce the fun in developing software > at this fantastic foundation. > > I've talked about umbrella projects enough. We've diverted conversation enough. > Enough people have tried to act like there is some technical mumbo jumbo that is > preventing the eventual act of higher power that I myself hope comes should these > discussions prove unfruitful through normal means. > > *these. are. separate. projects.* > *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* > > In this email: http://s.apache.org/rSm > > And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy > through below for splitting these projects into their own TLPs: > > -----snip > Process: > > 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. > > 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've > already discussed. > > 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus > can be reached (just a thought experiment). VOTE if necessary. > > 3. [VOTE] thread for <TLP name> > > 4. Create Project: > a. paste resolution from #0 to board@ or; > b. go to general@incubator and start new Incubator project. > > 5. infrastructure set up. > MLs moving; new UNIX groups; website setup; > SVN setup like this: > > svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool MR name>; or > svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool YARN name>; or > svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool HDFS name> > > After all 3 have been created run: > > svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop > > 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency > issues from there. > > 7. If 4b; then graduate as TLP from Incubator. > > -----snip > > So that's my proposal. +1 on the general idea of splitting the projects predicated on fixing the issues that made the last split so painful and resolving technicalities like dependencies, etc. Here's a perspective of a downstream producer of a distribution built on top of Hadoop: I firmly believe that at least with Hadoop 2.0 we've reached a point where HDFS and YARN/Mapreduce being standalone loosely coupled projects would make much more sense. The user community of Bigtop has expressed interest in being able to mix-n-match versions of MR and HDFS and I believe this to be a very valid (and achievable!) use case. It is less clear what to do with the Hadoop 1.X code line, but my perception so far has been that it is mainly in maintenance mode and thus could be dealt with as an exceptional case. I've heard some integration concerns on this thread and while I appreciate them, I still believe that individual projects shouldn't be burdened by them to the extent that they can maintain a reasonable compatibility of the APIs. It is my personal opinion that HDFS and YARN/Mapreduce of the Hadoop 2.0 are ready to do that. Bigtop is there to keep them honest, provided that folks are willing to help us with that mission. Thanks, Roman.
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectDoug Cutting 2012-08-31, 16:00
On Fri, Aug 31, 2012 at 8:09 AM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > I am saying that the current members of the Apache Software Foundation's Hadoop > Project Management Committee exhibit the characteristics (not just during > discrete events; it's been happening for a long time) of folks who in reality > shouldn't belong to the same project management committee. Note: this is > NOT a bad thing. There are probably plenty of (sub-)sets of groups at Apache > and elsewhere that folks wouldn't fit in to. I've enumerated some of > those characteristics that you can see sometimes spill over > (meta thought discussions about moving things around; or drawing arbitrary > lines around pieces of code that really have nothing to do with technical > stuff, and more to do about insulating and control;), Hadoop's community is not perfect. But the divisions in the community are not primarily aligned with subcomponent boundaries. A project split will thus not likely fix the majority of these community imperfections. It may fix some, but ought to be pursued carefully so that it doesn't cause more harm than good. > but there are also other > concerns such as frameworks put in to place (exclusivity amongst others) > that themselves are pretty high indicators that this is an umbrella project. The partitioning of committers has now been removed in a separate vote. Hadoop is not a classic umbrella project. Doug
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectMattmann, Chris A 2012-08-31, 16:08
Hey Doug,
On Aug 31, 2012, at 9:00 AM, Doug Cutting wrote: > On Fri, Aug 31, 2012 at 8:09 AM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED]> wrote: >> I am saying that the current members of the Apache Software Foundation's Hadoop >> Project Management Committee exhibit the characteristics (not just during >> discrete events; it's been happening for a long time) of folks who in reality >> shouldn't belong to the same project management committee. Note: this is >> NOT a bad thing. There are probably plenty of (sub-)sets of groups at Apache >> and elsewhere that folks wouldn't fit in to. I've enumerated some of >> those characteristics that you can see sometimes spill over >> (meta thought discussions about moving things around; or drawing arbitrary >> lines around pieces of code that really have nothing to do with technical >> stuff, and more to do about insulating and control;), > > Hadoop's community is not perfect. But the divisions in the community > are not primarily aligned with subcomponent boundaries. A project > split will thus not likely fix the majority of these community > imperfections. It may fix some, but ought to be pursued carefully so > that it doesn't cause more harm than good. My own personal opinion of this is that yeah they aren't necessarily aligned subcomponent boundaries too so +1 agree with you. > >> but there are also other >> concerns such as frameworks put in to place (exclusivity amongst others) >> that themselves are pretty high indicators that this is an umbrella project. > > The partitioning of committers has now been removed in a separate > vote. Hadoop is not a classic umbrella project. Despite me thinking that's a band-aid it's probably at least a good start. Let's hope it leads to some better interactions amongst the community members and to better health overall. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEli Collins 2012-08-31, 16:54
How about a proposal to just spin YARN off as a TLP? Rationale:
1. YARN started as a separate project and has a more independent community than Common/HDFS/MR (per below these communities do not divide at sub-project boundaries) that appears to want to be even more independent. 2. YARN is technically much easier to separate from the rest of the code base (than separating Common and HDFS for example). Separating it out will also help accelerate other efforts like MR2 support for Apache Mesos. 3. It side steps a number of thorny issues (how to handle branch-1, how to handle what Hadoop is wrt enforcing trademark, who to remove people from the Hadoop PMC, etc) that haven't been addressed in any of these proposals. 4. It's a proof point - if you can't make the case for YARN then there's no way we're going to make a case for splitting the other projects (this thread). Ie this doesn't have to be an all-or-nothing proposition for all sub-projects, since the communities don't fall on sub-project boundaries. Thanks, Eli On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) <[EMAIL PROTECTED]> wrote: > [decided to minimize traffic and to simply put this in one thread] > > Hi Guys, > > See the recent discussion on these threads: > > YARN as its own Hadoop "sub project": http://s.apache.org/WW1 > Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx > > ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating > as a single project, that's masking separate communities that themselves are really > separate ASF projects. > > At the ASF, this has been a problem area called "umbrella" projects and over the years, > all I've seen from them is wasted bandwidth, artificial barriers and the inventions of > new ways to perform process mongering and to reduce the fun in developing software > at this fantastic foundation. > > I've talked about umbrella projects enough. We've diverted conversation enough. > Enough people have tried to act like there is some technical mumbo jumbo that is > preventing the eventual act of higher power that I myself hope comes should these > discussions prove unfruitful through normal means. > > *these. are. separate. projects.* > *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* > > In this email: http://s.apache.org/rSm > > And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy > through below for splitting these projects into their own TLPs: > > -----snip > Process: > > 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. > > 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've > already discussed. > > 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus > can be reached (just a thought experiment). VOTE if necessary. > > 3. [VOTE] thread for <TLP name> > > 4. Create Project: > a. paste resolution from #0 to board@ or; > b. go to general@incubator and start new Incubator project. > > 5. infrastructure set up. > MLs moving; new UNIX groups; website setup; > SVN setup like this: > > svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool MR name>; or > svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool YARN name>; or > svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool HDFS name> > > After all 3 have been created run: > > svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop > > 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency > issues from there. > > 7. If 4b; then graduate as TLP from Incubator.
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectRobert Evans 2012-08-31, 16:58
The problem there is that YARN depends on Common, and MapReduce depends on
YARN, so we would either have a circular dependency or we would have to split off MapRedcue too. --Bobby On 8/31/12 11:54 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >How about a proposal to just spin YARN off as a TLP? Rationale: > >1. YARN started as a separate project and has a more independent >community than Common/HDFS/MR (per below these communities do not >divide at sub-project boundaries) that appears to want to be even more >independent. > >2. YARN is technically much easier to separate from the rest of the >code base (than separating Common and HDFS for example). Separating it >out will also help accelerate other efforts like MR2 support for >Apache Mesos. > >3. It side steps a number of thorny issues (how to handle branch-1, >how to handle what Hadoop is wrt enforcing trademark, who to remove >people from the Hadoop PMC, etc) that haven't been addressed in any of >these proposals. > >4. It's a proof point - if you can't make the case for YARN then >there's no way we're going to make a case for splitting the other >projects (this thread). > >Ie this doesn't have to be an all-or-nothing proposition for all >sub-projects, since the communities don't fall on sub-project >boundaries. > >Thanks, >Eli > >On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) ><[EMAIL PROTECTED]> wrote: >> [decided to minimize traffic and to simply put this in one thread] >> >> Hi Guys, >> >> See the recent discussion on these threads: >> >> YARN as its own Hadoop "sub project": http://s.apache.org/WW1 >> Maintain a single committer list for the Hadoop project: >>http://s.apache.org/Owx >> >> ...and just pay attention to the Hadoop project over the last 3-4 >>years. It's operating >> as a single project, that's masking separate communities that >>themselves are really >> separate ASF projects. >> >> At the ASF, this has been a problem area called "umbrella" projects and >>over the years, >> all I've seen from them is wasted bandwidth, artificial barriers and >>the inventions of >> new ways to perform process mongering and to reduce the fun in >>developing software >> at this fantastic foundation. >> >> I've talked about umbrella projects enough. We've diverted conversation >>enough. >> Enough people have tried to act like there is some technical mumbo >>jumbo that is >> preventing the eventual act of higher power that I myself hope comes >>should these >> discussions prove unfruitful through normal means. >> >> *these. are. separate. projects.* >> >>*there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.o >>wn.communities* >> >> In this email: http://s.apache.org/rSm >> >> And in the 2 subsequent follow ons in that thread, I've outlined a >>process that I'll copy >> through below for splitting these projects into their own TLPs: >> >> -----snip >> Process: >> >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 >>below, potentially draft resolution too. >> >> 1. Decide on an initial set of *PMC* members. I urge each new TLP to >>adopt PMC==C. See reasons I've >> already discussed. >> >> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can >>be discussed and consensus >> can be reached (just a thought experiment). VOTE if necessary. >> >> 3. [VOTE] thread for <TLP name> >> >> 4. Create Project: >> a. paste resolution from #0 to board@ or; >> b. go to general@incubator and start new Incubator project. >> >> 5. infrastructure set up. >> MLs moving; new UNIX groups; website setup; >> SVN setup like this: >> >> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ >>https://svn.apache.org/repos/asf/<insert cool MR name>; or >> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ >>https://svn.apache.org/repos/asf/<insert cool YARN name>; or >> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ >>https://svn.apache.org/repos/asf/<insert cool HDFS name> >>
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectTodd Lipcon 2012-08-31, 16:59
On Thu, Aug 30, 2012 at 11:50 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote: > Hi Andrew, > > On Aug 30, 2012, at 11:42 PM, Andrew Purtell wrote: > >> If Apache Hadoop -- as an umbrella or sum of its parts -- isn't practical >> to develop end applications or downstream projects on, the community will >> disappear. > > Sure, the end-user community might disappear, but the point I'm trying to make is > that the community is more than that. It's developers that build code together > ("community over code"); it's folks who write documentation who are part of the > project's committee of folks working together to develop software for the public > good at this Foundation. It's folks who write unit tests as part of that. It's also people > that fly by on the lists and that need help; or that may throw up a patch, or > whatever. It's other members of the Apache Software Foundation that are > charged with caring and giving a rip about the Foundation's projects. Well, speaking as one of the developer community who hasn't been a traditional user of Hadoop since my previous job in 2008: if the end user community started to languish, I (and 80% of the other most involved contributors) would probably stop working on the project pretty quickly. We're here because a user community exists, which funds our employers, who fund us. Another point I'll make is that I've talked to a number of former contributors (from the 0.20 days) who pretty much stopped contributing because of the code base churn around the prior project split. It became too much effort to forward and back port patches from their internal branches, so their cost/reward tradeoff dipped negative. So there are real community costs associated with what seem like "technical" changes. I don't know who came up with the original "community over code" mantra, or whether the ASF truly thinks these are hard and fast rules rather than principles and guidelines. But, if I may be so bold, I would much prefer the mantra of "community around code". Without the code at the center of any project, we'd just be a bunch of nerds shooting the shit. The code's what ties us together, and the pressure of keeping a centralized codebase that we can all feel good about shipping is what allows us to get past our differences and produce high quality software. The best reference I can find on apache.org is the Committer's FAQ: http://www.apache.org/dev/committers.html where it says explicitly: > Note: While there is not an official list, the following six principles have been cited as the core beliefs of The Apache Way: > - collaborative software development > - commercial-friendly standard license > - consistently high quality software > - respectful, honest, technical-based interaction > - faithful implementation of standards > - security as a mandatory feature Maybe you disagree, but from my perspective, we're doing reasonably well on all of them. You may not think there's much collaboration, but in the last 2-3 weeks, I have collaborated on Hadoop-related work with developers from Trend Micro, Facebook, Calxeda, Hortonworks, and interacted with users from a much wider variety of organizations. As Andrew said, I thought we were going along pretty well before this thread. As for technical things we need to do to get to a feasible split: big +1 that classpath pollution issues are near top of the list. We need a reasonable classloader strategy, and I think Tom's OSGi stuff is a good start in that direction. But it's going to be quite some time before that's all integrated and pulled into dependent projects, etc. So let's work on it but not be rash in our decisions. -Todd -- Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectTodd Lipcon 2012-08-31, 17:06
On Fri, Aug 31, 2012 at 9:58 AM, Robert Evans <[EMAIL PROTECTED]> wrote:
> The problem there is that YARN depends on Common, and MapReduce depends on > YARN, so we would either have a circular dependency or we would have to > split off MapRedcue too. I haven't been in the MR codebase much of late, so I'll defer to your judgment here: would it be feasible to have an abstraction layer for "cluster manager" separated out into a pile of interfaces? Then we could leave MR inside Hadoop, and Yarn would have an "MR->Yarn binding" module. I'm not sure where the line would be drawn, but one possibility would be to separate out the MR _task_ code from the MR scheduling code (AM, Job Submission, etc) Again would be a large project, but as Eli said, it would help make MR more "relocatable" onto other cluster schedulers like Mesos (or even a traditional grid scheduler). Another possible boon there would be something I've discussed with Arun a few times: it would be cool if we could get the new MR task code (in particular the rewritten reduce, but also some of the new exciting work that Tsuyoshi and Mariappan are doing) running in the context of an MR1 cluster. -Todd > > On 8/31/12 11:54 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote: > >>How about a proposal to just spin YARN off as a TLP? Rationale: >> >>1. YARN started as a separate project and has a more independent >>community than Common/HDFS/MR (per below these communities do not >>divide at sub-project boundaries) that appears to want to be even more >>independent. >> >>2. YARN is technically much easier to separate from the rest of the >>code base (than separating Common and HDFS for example). Separating it >>out will also help accelerate other efforts like MR2 support for >>Apache Mesos. >> >>3. It side steps a number of thorny issues (how to handle branch-1, >>how to handle what Hadoop is wrt enforcing trademark, who to remove >>people from the Hadoop PMC, etc) that haven't been addressed in any of >>these proposals. >> >>4. It's a proof point - if you can't make the case for YARN then >>there's no way we're going to make a case for splitting the other >>projects (this thread). >> >>Ie this doesn't have to be an all-or-nothing proposition for all >>sub-projects, since the communities don't fall on sub-project >>boundaries. >> >>Thanks, >>Eli >> >>On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) >><[EMAIL PROTECTED]> wrote: >>> [decided to minimize traffic and to simply put this in one thread] >>> >>> Hi Guys, >>> >>> See the recent discussion on these threads: >>> >>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1 >>> Maintain a single committer list for the Hadoop project: >>>http://s.apache.org/Owx >>> >>> ...and just pay attention to the Hadoop project over the last 3-4 >>>years. It's operating >>> as a single project, that's masking separate communities that >>>themselves are really >>> separate ASF projects. >>> >>> At the ASF, this has been a problem area called "umbrella" projects and >>>over the years, >>> all I've seen from them is wasted bandwidth, artificial barriers and >>>the inventions of >>> new ways to perform process mongering and to reduce the fun in >>>developing software >>> at this fantastic foundation. >>> >>> I've talked about umbrella projects enough. We've diverted conversation >>>enough. >>> Enough people have tried to act like there is some technical mumbo >>>jumbo that is >>> preventing the eventual act of higher power that I myself hope comes >>>should these >>> discussions prove unfruitful through normal means. >>> >>> *these. are. separate. projects.* >>> >>>*there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.o >>>wn.communities* >>> >>> In this email: http://s.apache.org/rSm >>> >>> And in the 2 subsequent follow ons in that thread, I've outlined a >>>process that I'll copy >>> through below for splitting these projects into their own TLPs: >>> >>> -----snip >>> Process: >>> >>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 Todd Lipcon Software Engineer, Cloudera
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAlejandro Abdelnur 2012-08-31, 17:10
On Fri, Aug 31, 2012 at 9:59 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> As for technical things we need to do to get to a feasible split: big > +1 that classpath pollution issues are near top of the list. We need a > reasonable classloader strategy, and I think Tom's OSGi stuff is a > good start in that direction. But it's going to be quite some time > before that's all integrated and pulled into dependent projects, etc. > So let's work on it but not be rash in our decisions. Seriously, this is a MUST. Until we address this, splitting is like a broken pen. Thx -- Alejandro
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAlejandro Abdelnur 2012-08-31, 17:11
s/pen/pencil/
On Fri, Aug 31, 2012 at 10:10 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote: > On Fri, Aug 31, 2012 at 9:59 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > >> As for technical things we need to do to get to a feasible split: big >> +1 that classpath pollution issues are near top of the list. We need a >> reasonable classloader strategy, and I think Tom's OSGi stuff is a >> good start in that direction. But it's going to be quite some time >> before that's all integrated and pulled into dependent projects, etc. >> So let's work on it but not be rash in our decisions. > > Seriously, this is a MUST. Until we address this, splitting is like a > broken pen. > > Thx > > -- > Alejandro -- Alejandro
-
RE: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectJagane Sundar 2012-08-31, 17:24
> As for technical things we need to do to get to a feasible split: big > +1 that classpath pollution issues are near top of the list. We need a > reasonable classloader strategy, and I think Tom's OSGi stuff is a > good start in that direction. But it's going to be quite some time > before that's all integrated and pulled into dependent projects, etc. > So let's work on it but not be rash in our decisions. Just a quick comment regarding the OSGi specification - Eclipse plugins use OSGi 'bundles'. This is the most excruciatingly painful aspect of building plugins for eclipse. I am sure there are other experts here who can chime in, but google for eclipse plugin classpath problems, and you will get an earful... Jagane
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectRobert Evans 2012-08-31, 18:15
That would be wonderful to have. +1 I would love to see MR run on more
then just HDFS/YARN. So people can pick what execution environment makes since for them, just like what MPI does, or something like what HDFS does with FileSystem. My perspective was just from the current state of things, if we want to invert the relationship that fixes the problem. I would be happy to help with doing that. --Bobby On 8/31/12 12:06 PM, "Todd Lipcon" <[EMAIL PROTECTED]> wrote: >On Fri, Aug 31, 2012 at 9:58 AM, Robert Evans <[EMAIL PROTECTED]> wrote: >> The problem there is that YARN depends on Common, and MapReduce depends >>on >> YARN, so we would either have a circular dependency or we would have to >> split off MapRedcue too. > >I haven't been in the MR codebase much of late, so I'll defer to your >judgment here: would it be feasible to have an abstraction layer for >"cluster manager" separated out into a pile of interfaces? Then we >could leave MR inside Hadoop, and Yarn would have an "MR->Yarn >binding" module. I'm not sure where the line would be drawn, but one >possibility would be to separate out the MR _task_ code from the MR >scheduling code (AM, Job Submission, etc) > >Again would be a large project, but as Eli said, it would help make MR >more "relocatable" onto other cluster schedulers like Mesos (or even a >traditional grid scheduler). Another possible boon there would be >something I've discussed with Arun a few times: it would be cool if we >could get the new MR task code (in particular the rewritten reduce, >but also some of the new exciting work that Tsuyoshi and Mariappan are >doing) running in the context of an MR1 cluster. > >-Todd > >> >> On 8/31/12 11:54 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote: >> >>>How about a proposal to just spin YARN off as a TLP? Rationale: >>> >>>1. YARN started as a separate project and has a more independent >>>community than Common/HDFS/MR (per below these communities do not >>>divide at sub-project boundaries) that appears to want to be even more >>>independent. >>> >>>2. YARN is technically much easier to separate from the rest of the >>>code base (than separating Common and HDFS for example). Separating it >>>out will also help accelerate other efforts like MR2 support for >>>Apache Mesos. >>> >>>3. It side steps a number of thorny issues (how to handle branch-1, >>>how to handle what Hadoop is wrt enforcing trademark, who to remove >>>people from the Hadoop PMC, etc) that haven't been addressed in any of >>>these proposals. >>> >>>4. It's a proof point - if you can't make the case for YARN then >>>there's no way we're going to make a case for splitting the other >>>projects (this thread). >>> >>>Ie this doesn't have to be an all-or-nothing proposition for all >>>sub-projects, since the communities don't fall on sub-project >>>boundaries. >>> >>>Thanks, >>>Eli >>> >>>On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) >>><[EMAIL PROTECTED]> wrote: >>>> [decided to minimize traffic and to simply put this in one thread] >>>> >>>> Hi Guys, >>>> >>>> See the recent discussion on these threads: >>>> >>>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1 >>>> Maintain a single committer list for the Hadoop project: >>>>http://s.apache.org/Owx >>>> >>>> ...and just pay attention to the Hadoop project over the last 3-4 >>>>years. It's operating >>>> as a single project, that's masking separate communities that >>>>themselves are really >>>> separate ASF projects. >>>> >>>> At the ASF, this has been a problem area called "umbrella" projects >>>>and >>>>over the years, >>>> all I've seen from them is wasted bandwidth, artificial barriers and >>>>the inventions of >>>> new ways to perform process mongering and to reduce the fun in >>>>developing software >>>> at this fantastic foundation. >>>> >>>> I've talked about umbrella projects enough. We've diverted >>>>conversation >>>>enough. >>>> Enough people have tried to act like there is some technical mumbo >>
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectInder.dev Java 2012-08-31, 19:00
How often you call for Emeriti lists.
Otherwise , if list is simply growing, then people may surprise like this. And also some people(Eli Collins) showed concerns about growing lists above right. But in reality all that people may not be active and not looking to project from long. Having them in active list will not help to Hadoop right. If they really want to active and help to project, they can regain at that time. Otherwise people may think to add new people, as list already big like above. no? -tx On Thu, Aug 30, 2012 at 10:30 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: > On Thu, Aug 30, 2012 at 12:33 PM, Inder.dev Java <[EMAIL PROTECTED]> > wrote: > > I am curious to know how that many people got access in Map Reduce/HDFS. > > Many of these are folks who were more active in the past. Hadoop is > now 6.5 years old. > > At Apache, merit does not expire: > > http://www.apache.org/dev/committers.html#committer-set-term > > Doug >
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectDoug Cutting 2012-08-31, 20:44
On Fri, Aug 31, 2012 at 12:00 PM, Inder.dev Java <[EMAIL PROTECTED]> wrote:
> How often you call for Emeriti lists. > Otherwise , if list is simply growing, then people may surprise like this. > And also some people(Eli Collins) showed concerns about growing lists above > right. But in reality all that people may not be active and not looking to > project from long. Having them in active list will not help to Hadoop > right. If they really want to active and help to project, they can regain > at that time. Otherwise people may think to add new people, as list already > big like above. no? Keeping people who are no longer active on the committer list shouldn't cause problems. No quorum is required for votes. Emeritus is used for folks who no longer follow the project at all. Some committers may no longer be contributing code regularly but they should still be reading the developer mailing lists and may vote. More active contributors primarily determine the current technical direction of the project by making contributions. Doug
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEric Baldeschwieler 2012-08-31, 22:43
I'd be fascinated to hear more from folks who have lead other projects at Apache how Hadoop's (and Lucene's, same DNA) committer management process compares and some good and bad lessons learned from those projects. Chris M has mentioned his experience from other projects. Can others comment?
Many projects do seem to have Emeriti lists / processes. How do people feel about that? How does the committer list size of Hadoop compare to other major Apache projects? Other lessons learned? On Aug 31, 2012, at 1:44 PM, Doug Cutting wrote: > On Fri, Aug 31, 2012 at 12:00 PM, Inder.dev Java <[EMAIL PROTECTED]> wrote: >> How often you call for Emeriti lists. >> Otherwise , if list is simply growing, then people may surprise like this. >> And also some people(Eli Collins) showed concerns about growing lists above >> right. But in reality all that people may not be active and not looking to >> project from long. Having them in active list will not help to Hadoop >> right. If they really want to active and help to project, they can regain >> at that time. Otherwise people may think to add new people, as list already >> big like above. no? > > Keeping people who are no longer active on the committer list > shouldn't cause problems. No quorum is required for votes. Emeritus > is used for folks who no longer follow the project at all. Some > committers may no longer be contributing code regularly but they > should still be reading the developer mailing lists and may vote. > More active contributors primarily determine the current technical > direction of the project by making contributions. > > Doug
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectEric Baldeschwieler 2012-09-01, 00:23
Hi Folks,
Lots of good points raised here. I remain convinced that the time to split Hadoop into TLPs in here, but I think we should also consider the practical concerns raised. Hadoop 2.0 has been years of work in the making and is finally relatively close. I think it would be a mistake to throw another impediment in the way of getting a stable version of 2.0 done, as many folks have pointed out. So I'd suggest that we plan to do a split once there is a broad consensus that 2.0 is stable and widely deployed. Perhaps folks interested in planning a split or concerned that a split might impact them can meet to refine a proposal that we can all consider implementing once 2.0 is stable. What do folks think? Thanks, E14 On Aug 31, 2012, at 9:08 AM, Mattmann, Chris A (388J) wrote: > Hey Doug, > > On Aug 31, 2012, at 9:00 AM, Doug Cutting wrote: > >> On Fri, Aug 31, 2012 at 8:09 AM, Mattmann, Chris A (388J) >> <[EMAIL PROTECTED]> wrote: >>> I am saying that the current members of the Apache Software Foundation's Hadoop >>> Project Management Committee exhibit the characteristics (not just during >>> discrete events; it's been happening for a long time) of folks who in reality >>> shouldn't belong to the same project management committee. Note: this is >>> NOT a bad thing. There are probably plenty of (sub-)sets of groups at Apache >>> and elsewhere that folks wouldn't fit in to. I've enumerated some of >>> those characteristics that you can see sometimes spill over >>> (meta thought discussions about moving things around; or drawing arbitrary >>> lines around pieces of code that really have nothing to do with technical >>> stuff, and more to do about insulating and control;), >> >> Hadoop's community is not perfect. But the divisions in the community >> are not primarily aligned with subcomponent boundaries. A project >> split will thus not likely fix the majority of these community >> imperfections. It may fix some, but ought to be pursued carefully so >> that it doesn't cause more harm than good. > > My own personal opinion of this is that yeah they aren't necessarily > aligned subcomponent boundaries too so +1 agree with you. > >> >>> but there are also other >>> concerns such as frameworks put in to place (exclusivity amongst others) >>> that themselves are pretty high indicators that this is an umbrella project. >> >> The partitioning of committers has now been removed in a separate >> vote. Hadoop is not a classic umbrella project. > > Despite me thinking that's a band-aid it's probably at least a good start. > Let's hope it leads to some better interactions amongst the community > members and to better health overall. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectSharad Agarwal 2012-09-01, 09:59
>
> > Hadoop 2.0 has been years of work in the making and is finally relatively > close. I think it would be a mistake to throw another impediment in the > way of getting a stable version of 2.0 done, as many folks have pointed > out. So I'd suggest that we plan to do a split once there is a broad > consensus that 2.0 is stable and widely deployed. > > +1 for revisiting the split once we have stable Hadoop 2.0 with large-scale and wide deployments.
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAndrew Purtell 2012-09-01, 13:21
As a downstream user of and contributor to BigTop (though only once, I know
we need to do better, Roman), it would be awesome to see the community rally around it as an integration point if the project splits into finer grained components yet. On Friday, August 31, 2012, Roman Shaposhnik wrote: > On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) > <[EMAIL PROTECTED] <javascript:;>> wrote: > > [decided to minimize traffic and to simply put this in one thread] > > > > Hi Guys, > > > > See the recent discussion on these threads: > > > > YARN as its own Hadoop "sub project": http://s.apache.org/WW1 > > Maintain a single committer list for the Hadoop project: > http://s.apache.org/Owx > > > > ...and just pay attention to the Hadoop project over the last 3-4 years. > It's operating > > as a single project, that's masking separate communities that themselves > are really > > separate ASF projects. > > > > At the ASF, this has been a problem area called "umbrella" projects and > over the years, > > all I've seen from them is wasted bandwidth, artificial barriers and the > inventions of > > new ways to perform process mongering and to reduce the fun in > developing software > > at this fantastic foundation. > > > > I've talked about umbrella projects enough. We've diverted conversation > enough. > > Enough people have tried to act like there is some technical mumbo jumbo > that is > > preventing the eventual act of higher power that I myself hope comes > should these > > discussions prove unfruitful through normal means. > > > > *these. are. separate. projects.* > > > *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* > > > > In this email: http://s.apache.org/rSm > > > > And in the 2 subsequent follow ons in that thread, I've outlined a > process that I'll copy > > through below for splitting these projects into their own TLPs: > > > > -----snip > > Process: > > > > 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 > below, potentially draft resolution too. > > > > 1. Decide on an initial set of *PMC* members. I urge each new TLP to > adopt PMC==C. See reasons I've > > already discussed. > > > > 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be > discussed and consensus > > can be reached (just a thought experiment). VOTE if necessary. > > > > 3. [VOTE] thread for <TLP name> > > > > 4. Create Project: > > a. paste resolution from #0 to board@ or; > > b. go to general@incubator and start new Incubator project. > > > > 5. infrastructure set up. > > MLs moving; new UNIX groups; website setup; > > SVN setup like this: > > > > svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ > https://svn.apache.org/repos/asf/<insert cool MR name>; or > > svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ > https://svn.apache.org/repos/asf/<insert cool YARN name>; or > > svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ > https://svn.apache.org/repos/asf/<insert cool HDFS name> > > > > After all 3 have been created run: > > > > svn remove -m "Remove Hadoop umbrella TLP. Split into separate > projects." https://svn.apache.org/repos/asf/hadoop > > > > 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate > as distinct communities, and try to solve the code duplication/dependency > > issues from there. > > > > 7. If 4b; then graduate as TLP from Incubator. > > > > -----snip > > > > So that's my proposal. > > +1 on the general idea of splitting the projects predicated on > fixing the issues that made the last split so painful and resolving > technicalities like dependencies, etc. > > Here's a perspective of a downstream producer of a distribution > built on top of Hadoop: I firmly believe that at least with Hadoop 2.0 > we've reached a point where HDFS and YARN/Mapreduce being > standalone loosely coupled projects would make much more sense. > The user community of Bigtop has expressed interest in being able to Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectAndrew Purtell 2012-09-01, 13:32
Or resurrect MR(v1) in Apache Hadoop as Apache YARN becomes a TLP, and let
the new YARN TLP decide if they want to use the Hadoop MR artifacts and/or contribute patches that harmonize the implementation with theirs, or pursue an alternate MR implementation within their larger framework. I'd imagine such a MR(v1) in Hadoop, if this happened, would concentrate on performance improvements, maybe such things as alternate shuffle plugins. Perhaps a HA JobTracker for parity with HDFS. But we could expect a clear separation where next generation framework work would be continued in and centered upon YARN, while Hadoop remains... well, Hadoop. On Friday, August 31, 2012, Robert Evans wrote: > The problem there is that YARN depends on Common, and MapReduce depends on > YARN, so we would either have a circular dependency or we would have to > split off MapRedcue too. > > --Bobby > > On 8/31/12 11:54 AM, "Eli Collins" <[EMAIL PROTECTED]> wrote: > > >How about a proposal to just spin YARN off as a TLP? Rationale: > > > >1. YARN started as a separate project and has a more independent > >community than Common/HDFS/MR (per below these communities do not > >divide at sub-project boundaries) that appears to want to be even more > >independent. > > > >2. YARN is technically much easier to separate from the rest of the > >code base (than separating Common and HDFS for example). Separating it > >out will also help accelerate other efforts like MR2 support for > >Apache Mesos. > > > >3. It side steps a number of thorny issues (how to handle branch-1, > >how to handle what Hadoop is wrt enforcing trademark, who to remove > >people from the Hadoop PMC, etc) that haven't been addressed in any of > >these proposals. > > > >4. It's a proof point - if you can't make the case for YARN then > >there's no way we're going to make a case for splitting the other > >projects (this thread). > > > >Ie this doesn't have to be an all-or-nothing proposition for all > >sub-projects, since the communities don't fall on sub-project > >boundaries. > > > >Thanks, > >Eli > > > >On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) > ><[EMAIL PROTECTED]> wrote: > >> [decided to minimize traffic and to simply put this in one thread] > >> > >> Hi Guys, > >> > >> See the recent discussion on these threads: > >> > >> YARN as its own Hadoop "sub project": http://s.apache.org/WW1 > >> Maintain a single committer list for the Hadoop project: > >>http://s.apache.org/Owx > >> > >> ...and just pay attention to the Hadoop project over the last 3-4 > >>years. It's operating > >> as a single project, that's masking separate communities that > >>themselves are really > >> separate ASF projects. > >> > >> At the ASF, this has been a problem area called "umbrella" projects and > >>over the years, > >> all I've seen from them is wasted bandwidth, artificial barriers and > >>the inventions of > >> new ways to perform process mongering and to reduce the fun in > >>developing software > >> at this fantastic foundation. > >> > >> I've talked about umbrella projects enough. We've diverted conversation > >>enough. > >> Enough people have tried to act like there is some technical mumbo > >>jumbo that is > >> preventing the eventual act of higher power that I myself hope comes > >>should these > >> discussions prove unfruitful through normal means. > >> > >> *these. are. separate. projects.* > >> > >>*there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.o > >>wn.communities* > >> > >> In this email: http://s.apache.org/rSm > >> > >> And in the 2 subsequent follow ons in that thread, I've outlined a > >>process that I'll copy > >> through below for splitting these projects into their own TLPs: > >> > >> -----snip > >> Process: > >> > >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 > >>below, potentially draft resolution too. > >> > >> 1. Decide on an initial set of *PMC* members. I urge each new TLP to > >>adopt PMC==C. See reasons I've Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella projectArun C Murthy 2012-09-03, 11:02
Andrew,
On Sep 1, 2012, at 6:32 AM, Andrew Purtell wrote: > I'd imagine such a MR(v1) in Hadoop, if this happened, would concentrate on > performance improvements, maybe such things as alternate shuffle plugins. > Perhaps a HA JobTracker for parity with HDFS. Lots of this has already happened in branch-1, please look at: # JT Availability: MAPREDUCE-3837, MAPREDUCE-4328, MAPREDUCE-4603 (WIP) # Performance - backports of PureJavaCrc32 in spills (MAPREDUCE-782), fadvise backports (MAPREDUCE-3289) and other several misc. fixes. thanks, Arun |