|
Nigel Daley
2011-01-31, 03:42
Nigel Daley
2011-02-09, 17:11
Mattmann, Chris A
2011-02-10, 03:37
Nigel Daley
2011-02-10, 06:35
Mattmann, Chris A
2011-02-10, 06:53
Roy T. Fielding
2011-02-10, 21:59
Bernd Fondermann
2011-02-11, 10:03
Tom White
2011-02-11, 18:12
Aaron Kimball
2011-02-11, 21:03
Tom White
2011-02-12, 00:55
Nigel Daley
2011-02-12, 06:01
Eric Baldeschwieler
2011-03-05, 21:29
Nigel Daley
2011-04-10, 05:13
Eric Sammer
2011-01-31, 04:22
Konstantin Boudnik
2011-01-31, 05:24
Steve Loughran
2011-01-31, 11:43
Konstantin Boudnik
2011-01-31, 16:51
Eric Baldeschwieler
2011-01-31, 05:37
Eli Collins
2011-01-31, 06:02
Owen O'Malley
2011-01-31, 07:19
Dhruba Borthakur
2011-01-31, 08:41
Konstantin Boudnik
2011-01-31, 16:49
Sanjay Radia
2011-02-17, 04:25
Konstantin Boudnik
2011-02-17, 05:30
Milind Bhandarkar
2011-01-31, 23:24
Todd Lipcon
2011-01-31, 23:23
Aaron Kimball
2011-02-01, 05:41
Allen Wittenauer
2011-02-01, 09:02
Tom White
2011-02-01, 17:37
Todd Lipcon
2011-02-01, 19:54
Todd Lipcon
2011-02-01, 19:46
Steve Loughran
2011-01-31, 11:47
|
-
[DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereNigel Daley 2011-01-31, 03:42
Folks,
Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. Here are the contrib components by project (hopefully I didn't miss any). Common Contrib: failmon hod test MapReduce Contrib: capacity-scheduler -- move to MR core? data_join dynamic-scheduler eclipse-plugin fairscheduler -- move to MR core? gridmix index mrunit mumak raid sqoop streaming -- move to MR core? vaidya vertica HDFS Contrib: fuse-dfs hdfsproxy thriftfs Cheers, Nige +
Nigel Daley 2011-01-31, 03:42
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereNigel Daley 2011-02-09, 17:11
After considering the feedback, I will move forward with calling a separate vote for each contrib (or small groups of contribs). The vote will ask the PMC to abandon the given contrib or move it to core. For those we agree to abandon, I will setup an Attic wiki that will point to the last SVN revision of the contrib. I will also start a Related Projects wiki (if we don't already have one) with pointers to the contrib modules that folks have volunteered to keep developing elsewhere.
Cheers, Nige On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote: > Folks, > > Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. > > These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. > > Here are the contrib components by project (hopefully I didn't miss any). > > Common Contrib: > failmon > hod > test > > > MapReduce Contrib: > capacity-scheduler -- move to MR core? > data_join > dynamic-scheduler > eclipse-plugin > fairscheduler -- move to MR core? > gridmix > index > mrunit > mumak > raid > sqoop > streaming -- move to MR core? > vaidya > vertica > > > HDFS Contrib: > fuse-dfs > hdfsproxy > thriftfs > > > Cheers, > Nige +
Nigel Daley 2011-02-09, 17:11
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereMattmann, Chris A 2011-02-10, 03:37
Hi Nigel,
My 2 cents -- why is Hadoop re-creating its own mini-Attic? The point of the umbrella project shakedown over the past year at Apache is to stop projects from recreating things like the Incubator (and the Attic) inside of the project. One suggestion: for your [VOTE] thread that you are calling -- if the result is to attic the project -- move it to the "real" Apache Attic, via a board resolution (you could do a single board resolution from the Hadoop PMC to the Apache Board containing the set of projects to Attic). Cheers, Chris On Feb 9, 2011, at 9:11 AM, Nigel Daley wrote: > After considering the feedback, I will move forward with calling a separate vote for each contrib (or small groups of contribs). The vote will ask the PMC to abandon the given contrib or move it to core. For those we agree to abandon, I will setup an Attic wiki that will point to the last SVN revision of the contrib. I will also start a Related Projects wiki (if we don't already have one) with pointers to the contrib modules that folks have volunteered to keep developing elsewhere. > > Cheers, > Nige > > On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote: > >> Folks, >> >> Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. >> >> These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. >> >> Here are the contrib components by project (hopefully I didn't miss any). >> >> Common Contrib: >> failmon >> hod >> test >> >> >> MapReduce Contrib: >> capacity-scheduler -- move to MR core? >> data_join >> dynamic-scheduler >> eclipse-plugin >> fairscheduler -- move to MR core? >> gridmix >> index >> mrunit >> mumak >> raid >> sqoop >> streaming -- move to MR core? >> vaidya >> vertica >> >> >> HDFS Contrib: >> fuse-dfs >> hdfsproxy >> thriftfs >> >> >> Cheers, >> Nige > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2011-02-10, 03:37
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereNigel Daley 2011-02-10, 06:35
Hi Chris,
You use the word 'project' over and over. These contrib modules are not Apache projects. They are source code directories within the various Hadoop subprojects. Reading thru attic.apache.org, I don't see how it directly relates. Surely not every 'svn remove <dir>' should be replaced with a move to attic.apache.org. Cheers, Nige On Feb 9, 2011, at 7:37 PM, Mattmann, Chris A (388J) wrote: > Hi Nigel, > > My 2 cents -- why is Hadoop re-creating its own mini-Attic? The point of the umbrella project shakedown over the past year at Apache is to stop projects from recreating things like the Incubator (and the Attic) inside of the project. > > One suggestion: for your [VOTE] thread that you are calling -- if the result is to attic the project -- move it to the "real" Apache Attic, via a board resolution (you could do a single board resolution from the Hadoop PMC to the Apache Board containing the set of projects to Attic). > > Cheers, > Chris > > On Feb 9, 2011, at 9:11 AM, Nigel Daley wrote: > >> After considering the feedback, I will move forward with calling a separate vote for each contrib (or small groups of contribs). The vote will ask the PMC to abandon the given contrib or move it to core. For those we agree to abandon, I will setup an Attic wiki that will point to the last SVN revision of the contrib. I will also start a Related Projects wiki (if we don't already have one) with pointers to the contrib modules that folks have volunteered to keep developing elsewhere. >> >> Cheers, >> Nige >> >> On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote: >> >>> Folks, >>> >>> Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. >>> >>> These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. >>> >>> Here are the contrib components by project (hopefully I didn't miss any). >>> >>> Common Contrib: >>> failmon >>> hod >>> test >>> >>> >>> MapReduce Contrib: >>> capacity-scheduler -- move to MR core? >>> data_join >>> dynamic-scheduler >>> eclipse-plugin >>> fairscheduler -- move to MR core? >>> gridmix >>> index >>> mrunit >>> mumak >>> raid >>> sqoop >>> streaming -- move to MR core? >>> vaidya >>> vertica >>> >>> >>> HDFS Contrib: >>> fuse-dfs >>> hdfsproxy >>> thriftfs >>> >>> >>> Cheers, >>> Nige >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [EMAIL PROTECTED] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +
Nigel Daley 2011-02-10, 06:35
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereMattmann, Chris A 2011-02-10, 06:53
Hi Nige,
Thanks. I think in fact it does directly relate -- to use your verbage below -- not ever svn remove <dir> requires a [DISCUSS] thread before svn removing it, right? Hence, my knee-jerk reaction on reading this thread is that contrib modules are = contrib modules in other umbrella projects -- some are small, and likely yes should be part of the core of Hadoop; others are not, and are in fact, "mini projects" that have been baking up in Hadoop for a while. In the case of the former, I agree, +1, those should be moved as part of the Hadoop core "project". In the case of the latter, I do not agree that simply the Hadoop project should create a wiki page and declare by VOTE that there won't be anymore development on them. In fact, they are perfect candidates for moving to the Attic where someone besides the Hadoop PMC might want to pick up on them at a later point in time. There is no hard and fast rule for the size of a TLP too btw: utilities that run on top of Hadoop could go through e.g., Incubation, and eventually graduate to TLPs. Cheers, Chris On Feb 9, 2011, at 10:35 PM, Nigel Daley wrote: > Hi Chris, > > You use the word 'project' over and over. These contrib modules are not Apache projects. They are source code directories within the various Hadoop subprojects. Reading thru attic.apache.org, I don't see how it directly relates. Surely not every 'svn remove <dir>' should be replaced with a move to attic.apache.org. > > Cheers, > Nige > > > On Feb 9, 2011, at 7:37 PM, Mattmann, Chris A (388J) wrote: > >> Hi Nigel, >> >> My 2 cents -- why is Hadoop re-creating its own mini-Attic? The point of the umbrella project shakedown over the past year at Apache is to stop projects from recreating things like the Incubator (and the Attic) inside of the project. >> >> One suggestion: for your [VOTE] thread that you are calling -- if the result is to attic the project -- move it to the "real" Apache Attic, via a board resolution (you could do a single board resolution from the Hadoop PMC to the Apache Board containing the set of projects to Attic). >> >> Cheers, >> Chris >> >> On Feb 9, 2011, at 9:11 AM, Nigel Daley wrote: >> >>> After considering the feedback, I will move forward with calling a separate vote for each contrib (or small groups of contribs). The vote will ask the PMC to abandon the given contrib or move it to core. For those we agree to abandon, I will setup an Attic wiki that will point to the last SVN revision of the contrib. I will also start a Related Projects wiki (if we don't already have one) with pointers to the contrib modules that folks have volunteered to keep developing elsewhere. >>> >>> Cheers, >>> Nige >>> >>> On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote: >>> >>>> Folks, >>>> >>>> Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. >>>> >>>> These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. >>>> >>>> Here are the contrib components by project (hopefully I didn't miss any). >>>> >>>> Common Contrib: >>>> failmon >>>> hod >>>> test >>>> >>>> >>>> MapReduce Contrib: >>>> capacity-scheduler -- move to MR core? >>>> data_join >>>> dynamic-scheduler >>>> eclipse-plugin >>>> fairscheduler -- move to MR core? ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +
Mattmann, Chris A 2011-02-10, 06:53
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereRoy T. Fielding 2011-02-10, 21:59
The Apache Attic exists for products that have been released by
the ASF but for which there is no longer a community capable of making decisions like "do we release a fix to this security issue". contrib stuff does not need to go to the Attic -- it can just be svn removed, or svn moved to something like a "sandbox" area that doesn't get released. ....Roy +
Roy T. Fielding 2011-02-10, 21:59
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereBernd Fondermann 2011-02-11, 10:03
-1.
Move it away from TRUNK so it doesn't affect builds is a much better (and simpler) solution. If someone wants to revive it, he can within the bounds of Apache Hadoop and will become a part of the Hadoop community, which would be good. If you'd move it off-site, if the code ever gets new contributors, it's hard to integrate them (code and contributors) into Hadoop again. AFAIUI, apache-extras is for placing non-Apache code closer to the related Apache projects, not for moving our code away from our own premises. Bernd On Mon, Jan 31, 2011 at 04:42, Nigel Daley <[EMAIL PROTECTED]> wrote: > Folks, > > Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. > > These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. > > Here are the contrib components by project (hopefully I didn't miss any). > > Common Contrib: > failmon > hod > test > > > MapReduce Contrib: > capacity-scheduler -- move to MR core? > data_join > dynamic-scheduler > eclipse-plugin > fairscheduler -- move to MR core? > gridmix > index > mrunit > mumak > raid > sqoop > streaming -- move to MR core? > vaidya > vertica > > > HDFS Contrib: > fuse-dfs > hdfsproxy > thriftfs > > > Cheers, > Nige > +
Bernd Fondermann 2011-02-11, 10:03
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereTom White 2011-02-11, 18:12
For contrib components that we elect not to remove, I suggest that we
remove them from the main build, so that failures in a contrib component don't hinder the main build and release. This is what ZooKeeper does, for example. Tom On Fri, Feb 11, 2011 at 2:03 AM, Bernd Fondermann <[EMAIL PROTECTED]> wrote: > -1. > > Move it away from TRUNK so it doesn't affect builds is a much better > (and simpler) solution. If someone wants to revive it, he can within > the bounds of Apache Hadoop and will become a part of the Hadoop > community, which would be good. > If you'd move it off-site, if the code ever gets new contributors, > it's hard to integrate them (code and contributors) into Hadoop again. > > AFAIUI, apache-extras is for placing non-Apache code closer to the > related Apache projects, not for moving our code away from our own > premises. > > Bernd > > On Mon, Jan 31, 2011 at 04:42, Nigel Daley <[EMAIL PROTECTED]> wrote: >> Folks, >> >> Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. >> >> These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. >> >> Here are the contrib components by project (hopefully I didn't miss any). >> >> Common Contrib: >> failmon >> hod >> test >> >> >> MapReduce Contrib: >> capacity-scheduler -- move to MR core? >> data_join >> dynamic-scheduler >> eclipse-plugin >> fairscheduler -- move to MR core? >> gridmix >> index >> mrunit >> mumak >> raid >> sqoop >> streaming -- move to MR core? >> vaidya >> vertica >> >> >> HDFS Contrib: >> fuse-dfs >> hdfsproxy >> thriftfs >> >> >> Cheers, >> Nige >> > +
Tom White 2011-02-11, 18:12
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereAaron Kimball 2011-02-11, 21:03
Tom,
How do these contrib components get released then? If the intent of having the code is to eventually produce release artifacts that people can use, then allowing them to further degrade in releasability seems antithetical to the point of keeping the source around. I think users who download Hadoop would be surprised to find that tools/projects under contrib do not operate as advertised. If the point of retaining the code would be instead to have it as an example for future developers to reference, then maybe it would be better to move them into an "attic" or "unmaintaned" directory so it is clear that we do not expect these to stay up to date. If we do expect them to work, and believe that they belong in the project, then I think we need to keep them building -- and yes, that adds to the burden associated with the release of Hadoop as a whole. - Aaron On Fri, Feb 11, 2011 at 10:12 AM, Tom White <[EMAIL PROTECTED]> wrote: > For contrib components that we elect not to remove, I suggest that we > remove them from the main build, so that failures in a contrib > component don't hinder the main build and release. This is what > ZooKeeper does, for example. > > Tom > > On Fri, Feb 11, 2011 at 2:03 AM, Bernd Fondermann > <[EMAIL PROTECTED]> wrote: > > -1. > > > > Move it away from TRUNK so it doesn't affect builds is a much better > > (and simpler) solution. If someone wants to revive it, he can within > > the bounds of Apache Hadoop and will become a part of the Hadoop > > community, which would be good. > > If you'd move it off-site, if the code ever gets new contributors, > > it's hard to integrate them (code and contributors) into Hadoop again. > > > > AFAIUI, apache-extras is for placing non-Apache code closer to the > > related Apache projects, not for moving our code away from our own > > premises. > > > > Bernd > > > > On Mon, Jan 31, 2011 at 04:42, Nigel Daley <[EMAIL PROTECTED]> wrote: > >> Folks, > >> > >> Now that http://apache-extras.org is launched ( > https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) > I'd like to start a discussion on moving contrib components out of common, > mapreduce, and hdfs. > >> > >> These contrib components complicate the builds, cause test failures that > nobody seems to care about, have releases that are tied to Hadoop's long > release cycles, etc. Most folks I've talked with agree that these contrib > components would be better served by being pulled out of Hadoop and hosted > elsewhere. The new apache-extras code hosting site seems like a natural > *default* location for migrating these contrib projects. Perhaps some > should graduate from contrib to src (ie from contrib to core of the project > they're included in). If folks agree, we'll need to come up with a mapping > of contrib component to it's final destination and file a jira. > >> > >> Here are the contrib components by project (hopefully I didn't miss > any). > >> > >> Common Contrib: > >> failmon > >> hod > >> test > >> > >> > >> MapReduce Contrib: > >> capacity-scheduler -- move to MR core? > >> data_join > >> dynamic-scheduler > >> eclipse-plugin > >> fairscheduler -- move to MR core? > >> gridmix > >> index > >> mrunit > >> mumak > >> raid > >> sqoop > >> streaming -- move to MR core? > >> vaidya > >> vertica > >> > >> > >> HDFS Contrib: > >> fuse-dfs > >> hdfsproxy > >> thriftfs > >> > >> > >> Cheers, > >> Nige > >> > > > +
Aaron Kimball 2011-02-11, 21:03
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereTom White 2011-02-12, 00:55
On Fri, Feb 11, 2011 at 1:03 PM, Aaron Kimball <[EMAIL PROTECTED]> wrote:
> Tom, > > How do these contrib components get released then? In source form - this is what ZooKeeper does. > If the intent of having > the code is to eventually produce release artifacts that people can use, > then allowing them to further degrade in releasability seems antithetical to > the point of keeping the source around. I think users who download Hadoop > would be surprised to find that tools/projects under contrib do not operate > as advertised. > > If the point of retaining the code would be instead to have it as an example > for future developers to reference, then maybe it would be better to move > them into an "attic" or "unmaintaned" directory so it is clear that we do > not expect these to stay up to date. If we do expect them to work, and > believe that they belong in the project, then I think we need to keep them > building -- and yes, that adds to the burden associated with the release of > Hadoop as a whole. Maybe. We haven't always done a great job of this though. > > - Aaron > > On Fri, Feb 11, 2011 at 10:12 AM, Tom White <[EMAIL PROTECTED]> wrote: > >> For contrib components that we elect not to remove, I suggest that we >> remove them from the main build, so that failures in a contrib >> component don't hinder the main build and release. This is what >> ZooKeeper does, for example. >> >> Tom >> >> On Fri, Feb 11, 2011 at 2:03 AM, Bernd Fondermann >> <[EMAIL PROTECTED]> wrote: >> > -1. >> > >> > Move it away from TRUNK so it doesn't affect builds is a much better >> > (and simpler) solution. If someone wants to revive it, he can within >> > the bounds of Apache Hadoop and will become a part of the Hadoop >> > community, which would be good. >> > If you'd move it off-site, if the code ever gets new contributors, >> > it's hard to integrate them (code and contributors) into Hadoop again. >> > >> > AFAIUI, apache-extras is for placing non-Apache code closer to the >> > related Apache projects, not for moving our code away from our own >> > premises. >> > >> > Bernd >> > >> > On Mon, Jan 31, 2011 at 04:42, Nigel Daley <[EMAIL PROTECTED]> wrote: >> >> Folks, >> >> >> >> Now that http://apache-extras.org is launched ( >> https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) >> I'd like to start a discussion on moving contrib components out of common, >> mapreduce, and hdfs. >> >> >> >> These contrib components complicate the builds, cause test failures that >> nobody seems to care about, have releases that are tied to Hadoop's long >> release cycles, etc. Most folks I've talked with agree that these contrib >> components would be better served by being pulled out of Hadoop and hosted >> elsewhere. The new apache-extras code hosting site seems like a natural >> *default* location for migrating these contrib projects. Perhaps some >> should graduate from contrib to src (ie from contrib to core of the project >> they're included in). If folks agree, we'll need to come up with a mapping >> of contrib component to it's final destination and file a jira. >> >> >> >> Here are the contrib components by project (hopefully I didn't miss >> any). >> >> >> >> Common Contrib: >> >> failmon >> >> hod >> >> test >> >> >> >> >> >> MapReduce Contrib: >> >> capacity-scheduler -- move to MR core? >> >> data_join >> >> dynamic-scheduler >> >> eclipse-plugin >> >> fairscheduler -- move to MR core? >> >> gridmix >> >> index >> >> mrunit >> >> mumak >> >> raid >> >> sqoop >> >> streaming -- move to MR core? >> >> vaidya >> >> vertica >> >> >> >> >> >> HDFS Contrib: >> >> fuse-dfs >> >> hdfsproxy >> >> thriftfs >> >> >> >> >> >> Cheers, >> >> Nige >> >> >> > >> > +
Tom White 2011-02-12, 00:55
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereNigel Daley 2011-02-12, 06:01
+1. And only include them as source in releases.
Nige On Feb 11, 2011, at 10:12 AM, Tom White wrote: > For contrib components that we elect not to remove, I suggest that we > remove them from the main build, so that failures in a contrib > component don't hinder the main build and release. This is what > ZooKeeper does, for example. > > Tom > > On Fri, Feb 11, 2011 at 2:03 AM, Bernd Fondermann > <[EMAIL PROTECTED]> wrote: >> -1. >> >> Move it away from TRUNK so it doesn't affect builds is a much better >> (and simpler) solution. If someone wants to revive it, he can within >> the bounds of Apache Hadoop and will become a part of the Hadoop >> community, which would be good. >> If you'd move it off-site, if the code ever gets new contributors, >> it's hard to integrate them (code and contributors) into Hadoop again. >> >> AFAIUI, apache-extras is for placing non-Apache code closer to the >> related Apache projects, not for moving our code away from our own >> premises. >> >> Bernd >> >> On Mon, Jan 31, 2011 at 04:42, Nigel Daley <[EMAIL PROTECTED]> wrote: >>> Folks, >>> >>> Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. >>> >>> These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. >>> >>> Here are the contrib components by project (hopefully I didn't miss any). >>> >>> Common Contrib: >>> failmon >>> hod >>> test >>> >>> >>> MapReduce Contrib: >>> capacity-scheduler -- move to MR core? >>> data_join >>> dynamic-scheduler >>> eclipse-plugin >>> fairscheduler -- move to MR core? >>> gridmix >>> index >>> mrunit >>> mumak >>> raid >>> sqoop >>> streaming -- move to MR core? >>> vaidya >>> vertica >>> >>> >>> HDFS Contrib: >>> fuse-dfs >>> hdfsproxy >>> thriftfs >>> >>> >>> Cheers, >>> Nige >>> >> +
Nigel Daley 2011-02-12, 06:01
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereEric Baldeschwieler 2011-03-05, 21:29
Hi Nigel,
I liked your previous plan. Could you summarize your current plan? Are we now planning to leave inactive projects in place? Ship them as source? Seems like this still sends a confusing message. I'm all for removing them from trunk in SVN, leaving a wiki link to their most recent state and letting their contributors make a case to revive them in whatever place they choose. Keeping projects with small constituencies in our tree seems counter productive. E14 On Feb 11, 2011, at 10:01 PM, Nigel Daley wrote: > +1. And only include them as source in releases. > > Nige > > On Feb 11, 2011, at 10:12 AM, Tom White wrote: > >> For contrib components that we elect not to remove, I suggest that we >> remove them from the main build, so that failures in a contrib >> component don't hinder the main build and release. This is what >> ZooKeeper does, for example. >> >> Tom >> >> On Fri, Feb 11, 2011 at 2:03 AM, Bernd Fondermann >> <[EMAIL PROTECTED]> wrote: >>> -1. >>> >>> Move it away from TRUNK so it doesn't affect builds is a much better >>> (and simpler) solution. If someone wants to revive it, he can within >>> the bounds of Apache Hadoop and will become a part of the Hadoop >>> community, which would be good. >>> If you'd move it off-site, if the code ever gets new contributors, >>> it's hard to integrate them (code and contributors) into Hadoop again. >>> >>> AFAIUI, apache-extras is for placing non-Apache code closer to the >>> related Apache projects, not for moving our code away from our own >>> premises. >>> >>> Bernd >>> >>> On Mon, Jan 31, 2011 at 04:42, Nigel Daley <[EMAIL PROTECTED]> wrote: >>>> Folks, >>>> >>>> Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. >>>> >>>> These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. >>>> >>>> Here are the contrib components by project (hopefully I didn't miss any). >>>> >>>> Common Contrib: >>>> failmon >>>> hod >>>> test >>>> >>>> >>>> MapReduce Contrib: >>>> capacity-scheduler -- move to MR core? >>>> data_join >>>> dynamic-scheduler >>>> eclipse-plugin >>>> fairscheduler -- move to MR core? >>>> gridmix >>>> index >>>> mrunit >>>> mumak >>>> raid >>>> sqoop >>>> streaming -- move to MR core? >>>> vaidya >>>> vertica >>>> >>>> >>>> HDFS Contrib: >>>> fuse-dfs >>>> hdfsproxy >>>> thriftfs >>>> >>>> >>>> Cheers, >>>> Nige >>>> >>> > +
Eric Baldeschwieler 2011-03-05, 21:29
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereNigel Daley 2011-04-10, 05:13
On Mar 5, 2011, at 1:29 PM, Eric Baldeschwieler wrote: > Hi Nigel, > > I liked your previous plan. Could you summarize your current plan? I basically gave up. Most everyone I talked to at various meetups was enthusiastically supportive of removing contrib components. When put to a vote on list, however, most were vetoed by 1 or 2 people. > Are we now planning to leave inactive projects in place? I guess so. HDFS Proxy seems like an example of this. > Ship them as source? I would support that as a fallback. > Seems like this still sends a confusing message. I'm all for removing them from trunk in SVN, leaving a wiki link to their most recent state and letting their contributors make a case to revive them in whatever place they choose. > > Keeping projects with small constituencies in our tree seems counter productive. Agreed, but we couldn't reach consensus. Cheers, Nige > On Feb 11, 2011, at 10:01 PM, Nigel Daley wrote: > >> +1. And only include them as source in releases. >> >> Nige >> >> On Feb 11, 2011, at 10:12 AM, Tom White wrote: >> >>> For contrib components that we elect not to remove, I suggest that we >>> remove them from the main build, so that failures in a contrib >>> component don't hinder the main build and release. This is what >>> ZooKeeper does, for example. >>> >>> Tom >>> >>> On Fri, Feb 11, 2011 at 2:03 AM, Bernd Fondermann >>> <[EMAIL PROTECTED]> wrote: >>>> -1. >>>> >>>> Move it away from TRUNK so it doesn't affect builds is a much better >>>> (and simpler) solution. If someone wants to revive it, he can within >>>> the bounds of Apache Hadoop and will become a part of the Hadoop >>>> community, which would be good. >>>> If you'd move it off-site, if the code ever gets new contributors, >>>> it's hard to integrate them (code and contributors) into Hadoop again. >>>> >>>> AFAIUI, apache-extras is for placing non-Apache code closer to the >>>> related Apache projects, not for moving our code away from our own >>>> premises. >>>> >>>> Bernd >>>> >>>> On Mon, Jan 31, 2011 at 04:42, Nigel Daley <[EMAIL PROTECTED]> wrote: >>>>> Folks, >>>>> >>>>> Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. >>>>> >>>>> These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. >>>>> >>>>> Here are the contrib components by project (hopefully I didn't miss any). >>>>> >>>>> Common Contrib: >>>>> failmon >>>>> hod >>>>> test >>>>> >>>>> >>>>> MapReduce Contrib: >>>>> capacity-scheduler -- move to MR core? >>>>> data_join >>>>> dynamic-scheduler >>>>> eclipse-plugin >>>>> fairscheduler -- move to MR core? >>>>> gridmix >>>>> index >>>>> mrunit >>>>> mumak >>>>> raid >>>>> sqoop >>>>> streaming -- move to MR core? >>>>> vaidya >>>>> vertica >>>>> >>>>> >>>>> HDFS Contrib: >>>>> fuse-dfs >>>>> hdfsproxy >>>>> thriftfs >>>>> >>>>> >>>>> Cheers, >>>>> Nige >>>>> >>>> >> > +
Nigel Daley 2011-04-10, 05:13
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereEric Sammer 2011-01-31, 04:22
Huge +1.
Sqoop has continued development at https://github.com/cloudera/sqoop MRUnit is at https://github.com/esammer/mrunit I think the former can be easily removed. The latter, per my previous email, I think could be removed as well. On Sun, Jan 30, 2011 at 10:42 PM, Nigel Daley <[EMAIL PROTECTED]> wrote: > Folks, > > Now that http://apache-extras.org is launched ( > https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) > I'd like to start a discussion on moving contrib components out of common, > mapreduce, and hdfs. > > These contrib components complicate the builds, cause test failures that > nobody seems to care about, have releases that are tied to Hadoop's long > release cycles, etc. Most folks I've talked with agree that these contrib > components would be better served by being pulled out of Hadoop and hosted > elsewhere. The new apache-extras code hosting site seems like a natural > *default* location for migrating these contrib projects. Perhaps some > should graduate from contrib to src (ie from contrib to core of the project > they're included in). If folks agree, we'll need to come up with a mapping > of contrib component to it's final destination and file a jira. > > Here are the contrib components by project (hopefully I didn't miss any). > > Common Contrib: > failmon > hod > test > > > MapReduce Contrib: > capacity-scheduler -- move to MR core? > data_join > dynamic-scheduler > eclipse-plugin > fairscheduler -- move to MR core? > gridmix > index > mrunit > mumak > raid > sqoop > streaming -- move to MR core? > vaidya > vertica > > > HDFS Contrib: > fuse-dfs > hdfsproxy > thriftfs > > > Cheers, > Nige > -- Eric Sammer twitter: esammer data: www.cloudera.com +
Eric Sammer 2011-01-31, 04:22
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereKonstantin Boudnik 2011-01-31, 05:24
While this seems to be a very good and long awaited advancement (btw
thanks to Eric Sammer for making a first project on the move out of MRUnit!) I have a concern about lack of Git support on http://apache-extras.org. I am sure that hosting decision was deeply considered by Apache board (or the whole Apache community) however SVN/Mercurial seems like a real drag for most of Hadoop developers. Basically, if contribs are moved to apache-extras.org then we are likely to face the same situation as with the core Hadoop: lotta development is done in personal Git repos forked from a Git/SVN mirror and then is committed back to the man SVN repository creating an extra cycle. Shall we not dictate a location of contrib projects once they are moved of Hadoop? If ppl feel like they are better be served by GitHub perhaps they should have an option to get hosted there? -- Take care, Konstantin (Cos) Boudnik On Sun, Jan 30, 2011 at 19:42, Nigel Daley <[EMAIL PROTECTED]> wrote: > Folks, > > Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. > > These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. > > Here are the contrib components by project (hopefully I didn't miss any). > > Common Contrib: > failmon > hod > test > > > MapReduce Contrib: > capacity-scheduler -- move to MR core? > data_join > dynamic-scheduler > eclipse-plugin > fairscheduler -- move to MR core? > gridmix > index > mrunit > mumak > raid > sqoop > streaming -- move to MR core? > vaidya > vertica > > > HDFS Contrib: > fuse-dfs > hdfsproxy > thriftfs > > > Cheers, > Nige > +
Konstantin Boudnik 2011-01-31, 05:24
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereSteve Loughran 2011-01-31, 11:43
On 31/01/11 05:24, Konstantin Boudnik wrote:
> Shall we not dictate a location of contrib projects once they are > moved of Hadoop? If ppl feel like they are better be served by GitHub > perhaps they should have an option to get hosted there? -I see discussions about Git at the ASF infra mailing lists -the stuff in contrib is code contributed to apache, should still live there if we can keep it going. Which means people have to step up, or we put it in some attic +
Steve Loughran 2011-01-31, 11:43
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereKonstantin Boudnik 2011-01-31, 16:51
On Mon, Jan 31, 2011 at 03:43, Steve Loughran <[EMAIL PROTECTED]> wrote:
> On 31/01/11 05:24, Konstantin Boudnik wrote: >> >> Shall we not dictate a location of contrib projects once they are >> moved of Hadoop? If ppl feel like they are better be served by GitHub >> perhaps they should have an option to get hosted there? > > > -I see discussions about Git at the ASF infra mailing lists Then I withdraw my earlier opinion about github vs. *-extras > -the stuff in contrib is code contributed to apache, should still live there > if we can keep it going. Which means people have to step up, or we put it in > some attic > > +
Konstantin Boudnik 2011-01-31, 16:51
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereEric Baldeschwieler 2011-01-31, 05:37
+1 - A really good idea to clean up the builds and ownership issues.
I think it is good to have a default location for related projects and apache-extras.org does seem like a logical place. We should also probably add a prominent wiki section to support use of apache-extras.org and to provide links to projects that want to host elsewhere. On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote: > Folks, > > Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. > > These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. > > Here are the contrib components by project (hopefully I didn't miss any). > > Common Contrib: > failmon > hod > test > > > MapReduce Contrib: > capacity-scheduler -- move to MR core? > data_join > dynamic-scheduler > eclipse-plugin > fairscheduler -- move to MR core? > gridmix > index > mrunit > mumak > raid > sqoop > streaming -- move to MR core? > vaidya > vertica > > > HDFS Contrib: > fuse-dfs > hdfsproxy > thriftfs > > > Cheers, > Nige +
Eric Baldeschwieler 2011-01-31, 05:37
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereEli Collins 2011-01-31, 06:02
+1
Agree w Eric's comments around defaults and documenting them. How about setting a date where the projects are created on apache extras and removed from contrib unless someone volunteers to maintain them? I agree apache extras should be the default unless the person volunteering to maintain the project wants to host it elsewhere. Btw I think raid should also probably move to core, it's actively worked on. I'll volunteer to create a project for fuse-dfs. Thanks, Eli On Sun, Jan 30, 2011 at 7:42 PM, Nigel Daley <[EMAIL PROTECTED]> wrote: > Folks, > > Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. > > These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. > > Here are the contrib components by project (hopefully I didn't miss any). > > Common Contrib: > failmon > hod > test > > > MapReduce Contrib: > capacity-scheduler -- move to MR core? > data_join > dynamic-scheduler > eclipse-plugin > fairscheduler -- move to MR core? > gridmix > index > mrunit > mumak > raid > sqoop > streaming -- move to MR core? > vaidya > vertica > > > HDFS Contrib: > fuse-dfs > hdfsproxy > thriftfs > > > Cheers, > Nige > +
Eli Collins 2011-01-31, 06:02
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereOwen O'Malley 2011-01-31, 07:19
On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote: > Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches > ) I'd like to start a discussion on moving contrib components out of > common, mapreduce, and hdfs. The PMC can't "move" code to Apache extras. It can only choose to abandon code that it doesn't want to support any longer. As a separate action some group of developers may create projects in Apache Extras based on the code from Hadoop. Therefore the question is really what if any code Hadoop wants to abandon. That is a good question and one that we should ask ourselves occasionally. After a quick consideration, my personal list would look like: failmon fault injection fuse-dfs hod kfs Also note that pushing code out of Hadoop has a high cost. There are at least 3 forks of the hadoop-gpl-compression code. That creates a lot of confusion for the users. A lot of users never go to the work to figure out which fork and branch of hadoop-gpl-compression work with the version of Hadoop they installed. -- Owen +
Owen O'Malley 2011-01-31, 07:19
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereDhruba Borthakur 2011-01-31, 08:41
I agree with Owen. If we move code out of the contrib project, then it is
more likely to create confusion among users, especially when multiple versions of the code base float around. But I agree that we should purge contrib code that is not being used or not being actively developed. thanks, dhruba On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote: > > Now that http://apache-extras.org is launched ( >> https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) >> I'd like to start a discussion on moving contrib components out of common, >> mapreduce, and hdfs. >> > > The PMC can't "move" code to Apache extras. It can only choose to abandon > code that it doesn't want to support any longer. As a separate action some > group of developers may create projects in Apache Extras based on the code > from Hadoop. > > Therefore the question is really what if any code Hadoop wants to abandon. > That is a good question and one that we should ask ourselves occasionally. > > After a quick consideration, my personal list would look like: > > failmon > fault injection > fuse-dfs > hod > kfs > > Also note that pushing code out of Hadoop has a high cost. There are at > least 3 forks of the hadoop-gpl-compression code. That creates a lot of > confusion for the users. A lot of users never go to the work to figure out > which fork and branch of hadoop-gpl-compression work with the version of > Hadoop they installed. > > -- Owen > > -- Connect to me at http://www.facebook.com/dhruba +
Dhruba Borthakur 2011-01-31, 08:41
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereKonstantin Boudnik 2011-01-31, 16:49
On Sun, Jan 30, 2011 at 23:19, Owen O'Malley <[EMAIL PROTECTED]> wrote:
> > On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote: > >> Now that http://apache-extras.org is launched >> (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) >> I'd like to start a discussion on moving contrib components out of common, >> mapreduce, and hdfs. > > The PMC can't "move" code to Apache extras. It can only choose to abandon > code that it doesn't want to support any longer. As a separate action some > group of developers may create projects in Apache Extras based on the code > from Hadoop. > > Therefore the question is really what if any code Hadoop wants to abandon. > That is a good question and one that we should ask ourselves occasionally. > > After a quick consideration, my personal list would look like: > > failmon > fault injection This is the best way to kill a project as tightly coupled with the core code as fault injection. So, if you really want to kill it - then move it. > fuse-dfs > hod > kfs > > Also note that pushing code out of Hadoop has a high cost. There are at > least 3 forks of the hadoop-gpl-compression code. That creates a lot of > confusion for the users. A lot of users never go to the work to figure out > which fork and branch of hadoop-gpl-compression work with the version of > Hadoop they installed. > > -- Owen > > +
Konstantin Boudnik 2011-01-31, 16:49
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereSanjay Radia 2011-02-17, 04:25
On Jan 31, 2011, at 10:19 PM, Konstantin Boudnik wrote: > On Sun, Jan 30, 2011 at 23:19, Owen O'Malley <[EMAIL PROTECTED]> > wrote: >> >> On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote: >> >>> Now that http://apache-extras.org is launched >>> (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches >>> ) >>> I'd like to start a discussion on moving contrib components out of >>> common, >>> mapreduce, and hdfs. >> >> The PMC can't "move" code to Apache extras. It can only choose to >> abandon >> code that it doesn't want to support any longer. As a separate >> action some >> group of developers may create projects in Apache Extras based on >> the code >> from Hadoop. >> >> Therefore the question is really what if any code Hadoop wants to >> abandon. >> That is a good question and one that we should ask ourselves >> occasionally. >> >> After a quick consideration, my personal list would look like: >> >> failmon >> fault injection > > This is the best way to kill a project as tightly coupled with the > core code as fault injection. > > So, if you really want to kill it - then move it. Nigel/Owen did not say "kill it". Folks were simply listing potential projects to move out. If you feel that it should stay in then simply say so and give the reasons -- looks like your reason is "tight coupling". sanjay > >> fuse-dfs >> hod >> kfs >> >> Also note that pushing code out of Hadoop has a high cost. There >> are at >> least 3 forks of the hadoop-gpl-compression code. That creates a >> lot of >> confusion for the users. A lot of users never go to the work to >> figure out >> which fork and branch of hadoop-gpl-compression work with the >> version of >> Hadoop they installed. >> >> -- Owen >> >> +
Sanjay Radia 2011-02-17, 04:25
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereKonstantin Boudnik 2011-02-17, 05:30
Yes, Sanjay. The reason is 'tight coupling'. In fact this was I who
was opposing it - not Nigel. I guess you misread the thread ;) On Wed, Feb 16, 2011 at 20:25, Sanjay Radia <[EMAIL PROTECTED]> wrote: > > On Jan 31, 2011, at 10:19 PM, Konstantin Boudnik wrote: > >> On Sun, Jan 30, 2011 at 23:19, Owen O'Malley <[EMAIL PROTECTED]> wrote: >>> >>> On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote: >>> >>>> Now that http://apache-extras.org is launched >>>> >>>> (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) >>>> I'd like to start a discussion on moving contrib components out of >>>> common, >>>> mapreduce, and hdfs. >>> >>> The PMC can't "move" code to Apache extras. It can only choose to abandon >>> code that it doesn't want to support any longer. As a separate action >>> some >>> group of developers may create projects in Apache Extras based on the >>> code >>> from Hadoop. >>> >>> Therefore the question is really what if any code Hadoop wants to >>> abandon. >>> That is a good question and one that we should ask ourselves >>> occasionally. >>> >>> After a quick consideration, my personal list would look like: >>> >>> failmon >>> fault injection >> >> This is the best way to kill a project as tightly coupled with the >> core code as fault injection. >> >> So, if you really want to kill it - then move it. > > > Nigel/Owen did not say "kill it". Folks were simply listing potential > projects to move out. > If you feel that it should stay in then simply say so and give the reasons > -- looks like your reason is "tight coupling". > > > sanjay > >> >>> fuse-dfs >>> hod >>> kfs >>> >>> Also note that pushing code out of Hadoop has a high cost. There are at >>> least 3 forks of the hadoop-gpl-compression code. That creates a lot of >>> confusion for the users. A lot of users never go to the work to figure >>> out >>> which fork and branch of hadoop-gpl-compression work with the version of >>> Hadoop they installed. >>> >>> -- Owen >>> >>> > > +
Konstantin Boudnik 2011-02-17, 05:30
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereMilind Bhandarkar 2011-01-31, 23:24
Owen,
I am surprised to not see jute (aka hadoop recordio) on this list. - milind On Jan 30, 2011, at 11:19 PM, Owen O'Malley wrote: > > On Jan 30, 2011, at 7:42 PM, Nigel Daley wrote: > >> Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. > > The PMC can't "move" code to Apache extras. It can only choose to abandon code that it doesn't want to support any longer. As a separate action some group of developers may create projects in Apache Extras based on the code from Hadoop. > > Therefore the question is really what if any code Hadoop wants to abandon. That is a good question and one that we should ask ourselves occasionally. > > After a quick consideration, my personal list would look like: > > failmon > fault injection > fuse-dfs > hod > kfs > > Also note that pushing code out of Hadoop has a high cost. There are at least 3 forks of the hadoop-gpl-compression code. That creates a lot of confusion for the users. A lot of users never go to the work to figure out which fork and branch of hadoop-gpl-compression work with the version of Hadoop they installed. > > -- Owen > --- Milind Bhandarkar [EMAIL PROTECTED] +
Milind Bhandarkar 2011-01-31, 23:24
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereTodd Lipcon 2011-01-31, 23:23
On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
> > Also note that pushing code out of Hadoop has a high cost. There are at > least 3 forks of the hadoop-gpl-compression code. That creates a lot of > confusion for the users. A lot of users never go to the work to figure out > which fork and branch of hadoop-gpl-compression work with the version of > Hadoop they installed. > > Indeed it creates confusion, but in my opinion it has been very successful modulo that confusion. In particular, Kevin and I (who each have a repo on github but basically co-maintain a branch) have done about 8 bugfix releases of LZO in the last year. The ability to take a bug and turn it around into a release within a few days has been very beneficial to the users. If it were part of core Hadoop, people would be forced to live with these blocker bugs for months at a time between dot releases. IMO the more we can take non-core components and move them to separate release timelines, the better. Yes, it is harder for users, but it also is easier for them when they hit a bug - they don't have to wait months for a wholesale upgrade which might contain hundreds of other changes to core components. I think this will also help the situation where people have set up shop on branches -- a lot of the value of these branches comes from the frequency of backports and bugfixes to "non-core" components. If the non-core stuff were on a faster timeline upstream, we could maintain core stability while also offering people the latest and greatest libraries, tools, codecs, etc. -Todd -- Todd Lipcon Software Engineer, Cloudera +
Todd Lipcon 2011-01-31, 23:23
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereAaron Kimball 2011-02-01, 05:41
+1 to this process in general.
In particular, tools like MRUnit can benefit from having an independent release due to where they are used in a project's lifecycle. MRUnit should be specified as a test dependency, whereas Hadoop itself is a compile/runtime dependency. As it stands, there isn't an easy way to manage this. This would increase flexibility for this tool, probably for others as well. - Aaron On Mon, Jan 31, 2011 at 3:23 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley <[EMAIL PROTECTED]> > wrote: > > > > > Also note that pushing code out of Hadoop has a high cost. There are at > > least 3 forks of the hadoop-gpl-compression code. That creates a lot of > > confusion for the users. A lot of users never go to the work to figure > out > > which fork and branch of hadoop-gpl-compression work with the version of > > Hadoop they installed. > > > > > Indeed it creates confusion, but in my opinion it has been very successful > modulo that confusion. > > In particular, Kevin and I (who each have a repo on github but basically > co-maintain a branch) have done about 8 bugfix releases of LZO in the last > year. The ability to take a bug and turn it around into a release within a > few days has been very beneficial to the users. If it were part of core > Hadoop, people would be forced to live with these blocker bugs for months > at > a time between dot releases. > > IMO the more we can take non-core components and move them to separate > release timelines, the better. Yes, it is harder for users, but it also is > easier for them when they hit a bug - they don't have to wait months for a > wholesale upgrade which might contain hundreds of other changes to core > components. I think this will also help the situation where people have set > up shop on branches -- a lot of the value of these branches comes from the > frequency of backports and bugfixes to "non-core" components. If the > non-core stuff were on a faster timeline upstream, we could maintain core > stability while also offering people the latest and greatest libraries, > tools, codecs, etc. > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera > +
Aaron Kimball 2011-02-01, 05:41
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereAllen Wittenauer 2011-02-01, 09:02
On Jan 31, 2011, at 3:23 PM, Todd Lipcon wrote: > On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > >> >> Also note that pushing code out of Hadoop has a high cost. There are at >> least 3 forks of the hadoop-gpl-compression code. That creates a lot of >> confusion for the users. A lot of users never go to the work to figure out >> which fork and branch of hadoop-gpl-compression work with the version of >> Hadoop they installed. >> >> > Indeed it creates confusion, but in my opinion it has been very successful > modulo that confusion. I'm not sure how the above works with what you wrote below: > In particular, Kevin and I (who each have a repo on github but basically > co-maintain a branch) have done about 8 bugfix releases of LZO in the last > year. The ability to take a bug and turn it around into a release within a > few days has been very beneficial to the users. If it were part of core > Hadoop, people would be forced to live with these blocker bugs for months at > a time between dot releases. So is the expectation that users would have to follow bread crumbs to the github dumping ground, then try to figure out which repo is the 'better' choice for their usage? Using LZO as an example, it appears we have a choice of kevin's, your's, or the master without even taking into consideration any tags. That sounds like a recipe for disaster that's even worse than what we have today. > IMO the more we can take non-core components and move them to separate > release timelines, the better. Yes, it is harder for users, but it also is > easier for them when they hit a bug - they don't have to wait months for a > wholesale upgrade which might contain hundreds of other changes to core > components. I'd agree except for one thing: even when users do provide patches to contrib components we ignore them. How long have those patches for HOD been sitting there in the patch queue? So of course they wait months/years--because we seemingly ignore anything that isn't important to us. Unfortunately, that covers a large chunk of contrib. :( +
Allen Wittenauer 2011-02-01, 09:02
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereTom White 2011-02-01, 17:37
+1 for the reasons already cited: independent release cycles,
testing/build problems, lack of maintenance, etc. I think we should strongly discourage new contrib components in favour of Apache Extras or github, remove inactive contrib components, and also allow maintainers to move components out if they volunteer to. HBase moved all its contrib components out of the main tree a few months back - can anyone comment how that worked out? I agree that we should move streaming (MAPREDUCE-602) and the schedulers to the main codebase. With work like MAPREDUCE-1478 we can put these components into a library tree so that the libraries can depend on core, but core doesn't depend on the libraries. Milind: Record IO is in Common (in the main tree, not a contrib component), and was deprecated in 0.21.0. We could remove it in a future release. Cheers, Tom On Tue, Feb 1, 2011 at 1:02 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > > On Jan 31, 2011, at 3:23 PM, Todd Lipcon wrote: > >> On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: >> >>> >>> Also note that pushing code out of Hadoop has a high cost. There are at >>> least 3 forks of the hadoop-gpl-compression code. That creates a lot of >>> confusion for the users. A lot of users never go to the work to figure out >>> which fork and branch of hadoop-gpl-compression work with the version of >>> Hadoop they installed. >>> >>> >> Indeed it creates confusion, but in my opinion it has been very successful >> modulo that confusion. > > I'm not sure how the above works with what you wrote below: > >> In particular, Kevin and I (who each have a repo on github but basically >> co-maintain a branch) have done about 8 bugfix releases of LZO in the last >> year. The ability to take a bug and turn it around into a release within a >> few days has been very beneficial to the users. If it were part of core >> Hadoop, people would be forced to live with these blocker bugs for months at >> a time between dot releases. > > So is the expectation that users would have to follow bread crumbs to the github dumping ground, then try to figure out which repo is the 'better' choice for their usage? Using LZO as an example, it appears we have a choice of kevin's, your's, or the master without even taking into consideration any tags. That sounds like a recipe for disaster that's even worse than what we have today. > > >> IMO the more we can take non-core components and move them to separate >> release timelines, the better. Yes, it is harder for users, but it also is >> easier for them when they hit a bug - they don't have to wait months for a >> wholesale upgrade which might contain hundreds of other changes to core >> components. > > I'd agree except for one thing: even when users do provide patches to contrib components we ignore them. How long have those patches for HOD been sitting there in the patch queue? So of course they wait months/years--because we seemingly ignore anything that isn't important to us. Unfortunately, that covers a large chunk of contrib. :( > > > +
Tom White 2011-02-01, 17:37
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereTodd Lipcon 2011-02-01, 19:54
On Tue, Feb 1, 2011 at 9:37 AM, Tom White <[EMAIL PROTECTED]> wrote:
> > HBase moved all its contrib components out of the main tree a few > months back - can anyone comment how that worked out? > > Sure. For each contrib: ec2: no longer exists, and now has been integrated into Whirr and much improved. Whirr has made several releases in the time that HBase has made one. The whirr contributors know way more about cloud deployment than the HBase contributors (except where they happen to overlap). Strong net positive. mdc_replication: pulled into core since it's developed by core committers and also needs a fair amount of tight integration with core components stargate: pulled into core - it was only in contrib as a sort of staging ground - it's really an improved/new version of the "rest" interface we already had in core. transactional: moved to github - this has languished a bit on github because only one person was actively maintaining it. However, it had already been "languishing" as part of contrib - even though it compiled, it never really worked very well in HBase trunk. So, moving it to a place where it's languished has just made it more obvious what was already true - that it isn't a well supported component (yet). Recently it's been taken back up by the author of it - if it develops a large user base it can move quickly and evolve without waiting on our release. Net: probably a wash So, overall, I'd say it was a good decision. Though we never had the same number of contribs that Hadoop seems to have sprouted. -Todd > > On Tue, Feb 1, 2011 at 1:02 AM, Allen Wittenauer > <[EMAIL PROTECTED]> wrote: > > > > On Jan 31, 2011, at 3:23 PM, Todd Lipcon wrote: > > > >> On Sun, Jan 30, 2011 at 11:19 PM, Owen O'Malley <[EMAIL PROTECTED]> > wrote: > >> > >>> > >>> Also note that pushing code out of Hadoop has a high cost. There are at > >>> least 3 forks of the hadoop-gpl-compression code. That creates a lot of > >>> confusion for the users. A lot of users never go to the work to figure > out > >>> which fork and branch of hadoop-gpl-compression work with the version > of > >>> Hadoop they installed. > >>> > >>> > >> Indeed it creates confusion, but in my opinion it has been very > successful > >> modulo that confusion. > > > > I'm not sure how the above works with what you wrote below: > > > >> In particular, Kevin and I (who each have a repo on github but basically > >> co-maintain a branch) have done about 8 bugfix releases of LZO in the > last > >> year. The ability to take a bug and turn it around into a release within > a > >> few days has been very beneficial to the users. If it were part of core > >> Hadoop, people would be forced to live with these blocker bugs for > months at > >> a time between dot releases. > > > > So is the expectation that users would have to follow bread crumbs > to the github dumping ground, then try to figure out which repo is the > 'better' choice for their usage? Using LZO as an example, it appears we > have a choice of kevin's, your's, or the master without even taking into > consideration any tags. That sounds like a recipe for disaster that's even > worse than what we have today. > > > > > >> IMO the more we can take non-core components and move them to separate > >> release timelines, the better. Yes, it is harder for users, but it also > is > >> easier for them when they hit a bug - they don't have to wait months for > a > >> wholesale upgrade which might contain hundreds of other changes to core > >> components. > > > > I'd agree except for one thing: even when users do provide > patches to contrib components we ignore them. How long have those patches > for HOD been sitting there in the patch queue? So of course they wait > months/years--because we seemingly ignore anything that isn't important to > us. Unfortunately, that covers a large chunk of contrib. :( > > > > > > > -- Todd Lipcon Software Engineer, Cloudera +
Todd Lipcon 2011-02-01, 19:54
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereTodd Lipcon 2011-02-01, 19:46
On Tue, Feb 1, 2011 at 1:02 AM, Allen Wittenauer
<[EMAIL PROTECTED]>wrote: > > > So is the expectation that users would have to follow bread crumbs > to the github dumping ground, then try to figure out which repo is the > 'better' choice for their usage? Using LZO as an example, it appears we > have a choice of kevin's, your's, or the master without even taking into > consideration any tags. That sounds like a recipe for disaster that's even > worse than what we have today. > > Kevin's and mine are currently identical (0e7005136e4160ed4cc157c4ddd7f4f1c6e11ffa) Not sure who "the master" is -- maybe you're referring to the Google Code repo? The reason we started working on github over a year ago is that the bugs we reported (and provided diffs for) in the Google Code project were ignored. For example: http://code.google.com/p/hadoop-gpl-compression/issues/detail?id=17 In fact this repo hasn't been updated since Sep '09: http://code.google.com/p/hadoop-gpl-compression/source/list Github provided an excellent place to collaborate on the project, make progress, fix bugs, and provide a better product for the users. As for "dumping ground," I don't quite follow your point - we develop in the open, accept pull requests from users, and code review each others' changes. Since October every commit has either been contributed by or fixes a bug reported by a user completely outside of the organizations where Kevin and I work. I agree that it's a bit of "breadcrumb following" to find the repo, though. We do at least have a link on the wiki: http://wiki.apache.org/hadoop/UsingLzoCompression which points to Kevin's repo. Perhaps the best solution here is to add a page to the official Hadoop site (not just the wiki) with links to actively maintained contrib projects? > > > IMO the more we can take non-core components and move them to separate > > release timelines, the better. Yes, it is harder for users, but it also > is > > easier for them when they hit a bug - they don't have to wait months for > a > > wholesale upgrade which might contain hundreds of other changes to core > > components. > > I'd agree except for one thing: even when users do provide patches > to contrib components we ignore them. How long have those patches for HOD > been sitting there in the patch queue? So of course they wait > months/years--because we seemingly ignore anything that isn't important to > us. Unfortunately, that covers a large chunk of contrib. :( > True - we ignore them because the core contributors generally have little clue about the contrib components, so don't feel qualified to review. I'll happily admit that I've never run failmon, index, dynamic-scheduler, eclipse-plugin, data_join, mumak, or vertica contribs. Wouldn't you rather these components lived on github so the people who wrote them could update them as they wished without having to wait on committers who have little to no clue about how to evaluate the changes? -Todd -- Todd Lipcon Software Engineer, Cloudera +
Todd Lipcon 2011-02-01, 19:46
-
Re: [DISCUSS] Move common, hdfs, mapreduce contrib components to apache-extras.org or elsewhereSteve Loughran 2011-01-31, 11:47
On 31/01/11 03:42, Nigel Daley wrote:
> Folks, > > Now that http://apache-extras.org is launched (https://blogs.apache.org/foundation/entry/the_apache_software_foundation_launches) I'd like to start a discussion on moving contrib components out of common, mapreduce, and hdfs. > > These contrib components complicate the builds, cause test failures that nobody seems to care about, have releases that are tied to Hadoop's long release cycles, etc. Most folks I've talked with agree that these contrib components would be better served by being pulled out of Hadoop and hosted elsewhere. The new apache-extras code hosting site seems like a natural *default* location for migrating these contrib projects. Perhaps some should graduate from contrib to src (ie from contrib to core of the project they're included in). If folks agree, we'll need to come up with a mapping of contrib component to it's final destination and file a jira. > > Here are the contrib components by project (hopefully I didn't miss any). > > Common Contrib: > failmon > hod > test > > > MapReduce Contrib: > capacity-scheduler -- move to MR core? > data_join > dynamic-scheduler > eclipse-plugin > fairscheduler -- move to MR core? > gridmix > index > mrunit > mumak > raid > sqoop > streaming -- move to MR core? > vaidya > vertica > +1 for the schedulers in core +1 for streaming For the "accessories",they are really separate projects that work on with Hadoop, but could have separate release schedules -move them to incubation, try and staff them. -if they aren't resourced, then that means they are dead code I'm -1 to having any support for filesystems other than Posix and HDFS in there, =0 on S3, but it's used widely enough it should stay in, especially as amazon do apparently provide some funding for testing. Because, as nigel points out, testing is the enemy. If you don't have the implementation of the filesystem in question, there is no way to be sure that some change works, you can't use it, release it saying "it works", or field bug reports. Testing and releasing of filesystem interfaces should be the responsibility of the filesystem suppliers or whoever wants to develop the bridge from the FS to Hadoop. This raises another issue which I've been thinking of recently, how do you define "compatibility". If, for example, my colleagues and I were to stand up say "our FS is compatible with Apache Hadoop", what does that mean? -Steve +
Steve Loughran 2011-01-31, 11:47
|