|
Aaron T. Myers
2011-04-01, 08:19
Todd Lipcon
2011-04-01, 08:25
Chris Douglas
2011-04-01, 08:57
Nigel Daley
2011-04-01, 15:26
Patrick Angeles
2011-04-01, 16:13
Mattmann, Chris A
2011-04-01, 17:06
Allen Wittenauer
2011-04-01, 17:40
Brian Bockelman
2011-04-01, 18:29
Amr Awadallah
2011-04-01, 18:30
Mahadev Konar
2011-04-01, 18:35
Konstantin Boudnik
2011-04-01, 18:41
Allen Wittenauer
2011-04-01, 19:35
Konstantin Boudnik
2011-04-01, 19:46
|
-
Proposal: Further Project Split(s)Aaron T. Myers 2011-04-01, 08:19
Hello Hadoop Community,
Given the tremendous positive feedback we've all had regarding the HDFS, MapReduce, and Common project split, I'd like to propose we take the next step and further separate the existing projects. I propose we begin by splitting the MapReduce project into separate "Map" and "Reduce" sub-projects. This will provide us the opportunity to tease out the complex interdependencies between "map" and "reduce" that exist today, to encourage us to write more modular and isolated code, which should speed releases. This will also aid our users who exclusively run map-only or reduce-only jobs. These are important use-cases, and so should be given high priority. Given that these two portions of the existing MapReduce project share a great deal of code, we will likely need to release these two new projects concurrently at first, but the eventual goal should certainly be to be able to release "Map" and "Reduce" independently. This seems intuitive to me, given the remarkable recent advancements in the academic community regarding "reduce," while the research coming out of the "map" academics has largely stagnated of late. If this proposal is accepted, and it has the success I think it will, then we should strongly consider splitting the other two projects as well. My gut instinct is that we should split "HDFS" into "HD" and "FS" sub-projects, and simply rename the "Common" project to "C'Mon." We can think about the details of what exactly these project splits mean later. Please let me know what you think. Best, Aaron
-
Re: Proposal: Further Project Split(s)Todd Lipcon 2011-04-01, 08:25
+4.01. This is a terrific idea.
On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: > Hello Hadoop Community, > > Given the tremendous positive feedback we've all had regarding the HDFS, > MapReduce, and Common project split, I'd like to propose we take the next > step and further separate the existing projects. > > I propose we begin by splitting the MapReduce project into separate "Map" > and "Reduce" sub-projects. This will provide us the opportunity to tease > out > the complex interdependencies between "map" and "reduce" that exist today, > to encourage us to write more modular and isolated code, which should speed > releases. This will also aid our users who exclusively run map-only or > reduce-only jobs. These are important use-cases, and so should be given > high > priority. > > Given that these two portions of the existing MapReduce project share a > great deal of code, we will likely need to release these two new projects > concurrently at first, but the eventual goal should certainly be to be able > to release "Map" and "Reduce" independently. This seems intuitive to me, > given the remarkable recent advancements in the academic community > regarding > "reduce," while the research coming out of the "map" academics has largely > stagnated of late. > > If this proposal is accepted, and it has the success I think it will, then > we should strongly consider splitting the other two projects as well. My > gut > instinct is that we should split "HDFS" into "HD" and "FS" sub-projects, > and > simply rename the "Common" project to "C'Mon." We can think about the > details of what exactly these project splits mean later. > > Please let me know what you think. > > Best, > Aaron > -- Todd Lipcon Software Engineer, Cloudera
-
Re: Proposal: Further Project Split(s)Chris Douglas 2011-04-01, 08:57
Experience developing Hadoop has shown that we not only need to
partition our projects for more active releases, but we also should explore speculative project splits. For this, a Hadoop.next() project should track the development of a project scheduler that can partition the Hadoop subprojects, possibly running a second version of a subproject in parallel. Downstream subprojects and TLPs automatically accept whichever releases first as a dependency. Implementation should combine ant, ivy, maven, and at least one legacy Hadoop build tool (to be written). Of course, not all of these subprojects will succeed. When one fails (or is too slow with its project reports), the project scheduler will be responsible for respawning it in the Incubator. The project scheduler will, of course, be pluggable. -C On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: > Hello Hadoop Community, > > Given the tremendous positive feedback we've all had regarding the HDFS, > MapReduce, and Common project split, I'd like to propose we take the next > step and further separate the existing projects. > > I propose we begin by splitting the MapReduce project into separate "Map" > and "Reduce" sub-projects. This will provide us the opportunity to tease out > the complex interdependencies between "map" and "reduce" that exist today, > to encourage us to write more modular and isolated code, which should speed > releases. This will also aid our users who exclusively run map-only or > reduce-only jobs. These are important use-cases, and so should be given high > priority. > > Given that these two portions of the existing MapReduce project share a > great deal of code, we will likely need to release these two new projects > concurrently at first, but the eventual goal should certainly be to be able > to release "Map" and "Reduce" independently. This seems intuitive to me, > given the remarkable recent advancements in the academic community regarding > "reduce," while the research coming out of the "map" academics has largely > stagnated of late. > > If this proposal is accepted, and it has the success I think it will, then > we should strongly consider splitting the other two projects as well. My gut > instinct is that we should split "HDFS" into "HD" and "FS" sub-projects, and > simply rename the "Common" project to "C'Mon." We can think about the > details of what exactly these project splits mean later. > > Please let me know what you think. > > Best, > Aaron >
-
Re: Proposal: Further Project Split(s)Nigel Daley 2011-04-01, 15:26
-1+2. This could potentially allow us to replace Jenkins with Hadoop for our build and test infrastructure. That would be awesome!
n. On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: > Experience developing Hadoop has shown that we not only need to > partition our projects for more active releases, but we also should > explore speculative project splits. For this, a Hadoop.next() project > should track the development of a project scheduler that can partition > the Hadoop subprojects, possibly running a second version of a > subproject in parallel. Downstream subprojects and TLPs automatically > accept whichever releases first as a dependency. Implementation should > combine ant, ivy, maven, and at least one legacy Hadoop build tool (to > be written). > > Of course, not all of these subprojects will succeed. When one fails > (or is too slow with its project reports), the project scheduler will > be responsible for respawning it in the Incubator. > > The project scheduler will, of course, be pluggable. -C > > On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: >> Hello Hadoop Community, >> >> Given the tremendous positive feedback we've all had regarding the HDFS, >> MapReduce, and Common project split, I'd like to propose we take the next >> step and further separate the existing projects. >> >> I propose we begin by splitting the MapReduce project into separate "Map" >> and "Reduce" sub-projects. This will provide us the opportunity to tease out >> the complex interdependencies between "map" and "reduce" that exist today, >> to encourage us to write more modular and isolated code, which should speed >> releases. This will also aid our users who exclusively run map-only or >> reduce-only jobs. These are important use-cases, and so should be given high >> priority. >> >> Given that these two portions of the existing MapReduce project share a >> great deal of code, we will likely need to release these two new projects >> concurrently at first, but the eventual goal should certainly be to be able >> to release "Map" and "Reduce" independently. This seems intuitive to me, >> given the remarkable recent advancements in the academic community regarding >> "reduce," while the research coming out of the "map" academics has largely >> stagnated of late. >> >> If this proposal is accepted, and it has the success I think it will, then >> we should strongly consider splitting the other two projects as well. My gut >> instinct is that we should split "HDFS" into "HD" and "FS" sub-projects, and >> simply rename the "Common" project to "C'Mon." We can think about the >> details of what exactly these project splits mean later. >> >> Please let me know what you think. >> >> Best, >> Aaron >>
-
Re: Proposal: Further Project Split(s)Patrick Angeles 2011-04-01, 16:13
+1
This will allow Hadoop to better compete with GoDaddy's "Hadoop Killer" skunkworks project. On Fri, Apr 1, 2011 at 11:26 AM, Nigel Daley <[EMAIL PROTECTED]> wrote: > -1+2. This could potentially allow us to replace Jenkins with Hadoop for > our build and test infrastructure. That would be awesome! > > n. > > On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: > > > Experience developing Hadoop has shown that we not only need to > > partition our projects for more active releases, but we also should > > explore speculative project splits. For this, a Hadoop.next() project > > should track the development of a project scheduler that can partition > > the Hadoop subprojects, possibly running a second version of a > > subproject in parallel. Downstream subprojects and TLPs automatically > > accept whichever releases first as a dependency. Implementation should > > combine ant, ivy, maven, and at least one legacy Hadoop build tool (to > > be written). > > > > Of course, not all of these subprojects will succeed. When one fails > > (or is too slow with its project reports), the project scheduler will > > be responsible for respawning it in the Incubator. > > > > The project scheduler will, of course, be pluggable. -C > > > > On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: > >> Hello Hadoop Community, > >> > >> Given the tremendous positive feedback we've all had regarding the HDFS, > >> MapReduce, and Common project split, I'd like to propose we take the > next > >> step and further separate the existing projects. > >> > >> I propose we begin by splitting the MapReduce project into separate > "Map" > >> and "Reduce" sub-projects. This will provide us the opportunity to tease > out > >> the complex interdependencies between "map" and "reduce" that exist > today, > >> to encourage us to write more modular and isolated code, which should > speed > >> releases. This will also aid our users who exclusively run map-only or > >> reduce-only jobs. These are important use-cases, and so should be given > high > >> priority. > >> > >> Given that these two portions of the existing MapReduce project share a > >> great deal of code, we will likely need to release these two new > projects > >> concurrently at first, but the eventual goal should certainly be to be > able > >> to release "Map" and "Reduce" independently. This seems intuitive to me, > >> given the remarkable recent advancements in the academic community > regarding > >> "reduce," while the research coming out of the "map" academics has > largely > >> stagnated of late. > >> > >> If this proposal is accepted, and it has the success I think it will, > then > >> we should strongly consider splitting the other two projects as well. My > gut > >> instinct is that we should split "HDFS" into "HD" and "FS" sub-projects, > and > >> simply rename the "Common" project to "C'Mon." We can think about the > >> details of what exactly these project splits mean later. > >> > >> Please let me know what you think. > >> > >> Best, > >> Aaron > >> > >
-
Re: Proposal: Further Project Split(s)Mattmann, Chris A 2011-04-01, 17:06
LOL@Chris!!!
On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: > Experience developing Hadoop has shown that we not only need to > partition our projects for more active releases, but we also should > explore speculative project splits. For this, a Hadoop.next() project > should track the development of a project scheduler that can partition > the Hadoop subprojects, possibly running a second version of a > subproject in parallel. Downstream subprojects and TLPs automatically > accept whichever releases first as a dependency. Implementation should > combine ant, ivy, maven, and at least one legacy Hadoop build tool (to > be written). > > Of course, not all of these subprojects will succeed. When one fails > (or is too slow with its project reports), the project scheduler will > be responsible for respawning it in the Incubator. > > The project scheduler will, of course, be pluggable. -C > > On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: >> Hello Hadoop Community, >> >> Given the tremendous positive feedback we've all had regarding the HDFS, >> MapReduce, and Common project split, I'd like to propose we take the next >> step and further separate the existing projects. >> >> I propose we begin by splitting the MapReduce project into separate "Map" >> and "Reduce" sub-projects. This will provide us the opportunity to tease out >> the complex interdependencies between "map" and "reduce" that exist today, >> to encourage us to write more modular and isolated code, which should speed >> releases. This will also aid our users who exclusively run map-only or >> reduce-only jobs. These are important use-cases, and so should be given high >> priority. >> >> Given that these two portions of the existing MapReduce project share a >> great deal of code, we will likely need to release these two new projects >> concurrently at first, but the eventual goal should certainly be to be able >> to release "Map" and "Reduce" independently. This seems intuitive to me, >> given the remarkable recent advancements in the academic community regarding >> "reduce," while the research coming out of the "map" academics has largely >> stagnated of late. >> >> If this proposal is accepted, and it has the success I think it will, then >> we should strongly consider splitting the other two projects as well. My gut >> instinct is that we should split "HDFS" into "HD" and "FS" sub-projects, and >> simply rename the "Common" project to "C'Mon." We can think about the >> details of what exactly these project splits mean later. >> >> Please let me know what you think. >> >> Best, >> Aaron >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [EMAIL PROTECTED] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
Re: Proposal: Further Project Split(s)Allen Wittenauer 2011-04-01, 17:40
On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: > Experience developing Hadoop has shown that we not only need to > partition our projects for more active releases, but we also should > explore speculative project splits. For this, a Hadoop.next() project > should track the development of a project scheduler that can partition > the Hadoop subprojects, possibly running a second version of a > subproject in parallel. Downstream subprojects and TLPs automatically > accept whichever releases first as a dependency. Implementation should > combine ant, ivy, maven, and at least one legacy Hadoop build tool (to > be written). -1, until it supports eclipse.
-
Re: Proposal: Further Project Split(s)Brian Bockelman 2011-04-01, 18:29
On Apr 1, 2011, at 12:40 PM, Allen Wittenauer wrote: > > On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: > >> Experience developing Hadoop has shown that we not only need to >> partition our projects for more active releases, but we also should >> explore speculative project splits. For this, a Hadoop.next() project >> should track the development of a project scheduler that can partition >> the Hadoop subprojects, possibly running a second version of a >> subproject in parallel. Downstream subprojects and TLPs automatically >> accept whichever releases first as a dependency. Implementation should >> combine ant, ivy, maven, and at least one legacy Hadoop build tool (to >> be written). > > > -1, until it supports eclipse. > -1, until it supports emacs
-
Re: Proposal: Further Project Split(s)Amr Awadallah 2011-04-01, 18:30
Strong -1 from me, this *idiotic* since we first need to split the NN and DN
into separate projects. -- amr On Fri, Apr 1, 2011 at 10:40 AM, Allen Wittenauer <[EMAIL PROTECTED]>wrote: > > On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: > > > Experience developing Hadoop has shown that we not only need to > > partition our projects for more active releases, but we also should > > explore speculative project splits. For this, a Hadoop.next() project > > should track the development of a project scheduler that can partition > > the Hadoop subprojects, possibly running a second version of a > > subproject in parallel. Downstream subprojects and TLPs automatically > > accept whichever releases first as a dependency. Implementation should > > combine ant, ivy, maven, and at least one legacy Hadoop build tool (to > > be written). > > > -1, until it supports eclipse. > > >
-
Re: Proposal: Further Project Split(s)Mahadev Konar 2011-04-01, 18:35
+1. Brilliant idea!
>> On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: >> >>> Experience developing Hadoop has shown that we not only need to >>> partition our projects for more active releases, but we also should >>> explore speculative project splits. For this, a Hadoop.next() project >>> should track the development of a project scheduler that can partition >>> the Hadoop subprojects, possibly running a second version of a >>> subproject in parallel. Downstream subprojects and TLPs automatically >>> accept whichever releases first as a dependency. Implementation should >>> combine ant, ivy, maven, and at least one legacy Hadoop build tool (to >>> be written). >> >> >> -1, until it supports eclipse. >> > > -1, until it supports emacs -- thanks mahadev @mahadevkonar
-
Re: Proposal: Further Project Split(s)Konstantin Boudnik 2011-04-01, 18:41
On Fri, Apr 1, 2011 at 08:26, Nigel Daley <[EMAIL PROTECTED]> wrote:
> -1+2. This could potentially allow us to replace Jenkins with Hadoop for our build and test infrastructure. That would be awesome! Has anyone checked a calendar lately? > On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote: > >> Experience developing Hadoop has shown that we not only need to >> partition our projects for more active releases, but we also should >> explore speculative project splits. For this, a Hadoop.next() project >> should track the development of a project scheduler that can partition >> the Hadoop subprojects, possibly running a second version of a >> subproject in parallel. Downstream subprojects and TLPs automatically >> accept whichever releases first as a dependency. Implementation should >> combine ant, ivy, maven, and at least one legacy Hadoop build tool (to >> be written). >> >> Of course, not all of these subprojects will succeed. When one fails >> (or is too slow with its project reports), the project scheduler will >> be responsible for respawning it in the Incubator. >> >> The project scheduler will, of course, be pluggable. -C >> >> On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers <[EMAIL PROTECTED]> wrote: >>> Hello Hadoop Community, >>> >>> Given the tremendous positive feedback we've all had regarding the HDFS, >>> MapReduce, and Common project split, I'd like to propose we take the next >>> step and further separate the existing projects. >>> >>> I propose we begin by splitting the MapReduce project into separate "Map" >>> and "Reduce" sub-projects. This will provide us the opportunity to tease out >>> the complex interdependencies between "map" and "reduce" that exist today, >>> to encourage us to write more modular and isolated code, which should speed >>> releases. This will also aid our users who exclusively run map-only or >>> reduce-only jobs. These are important use-cases, and so should be given high >>> priority. >>> >>> Given that these two portions of the existing MapReduce project share a >>> great deal of code, we will likely need to release these two new projects >>> concurrently at first, but the eventual goal should certainly be to be able >>> to release "Map" and "Reduce" independently. This seems intuitive to me, >>> given the remarkable recent advancements in the academic community regarding >>> "reduce," while the research coming out of the "map" academics has largely >>> stagnated of late. >>> >>> If this proposal is accepted, and it has the success I think it will, then >>> we should strongly consider splitting the other two projects as well. My gut >>> instinct is that we should split "HDFS" into "HD" and "FS" sub-projects, and >>> simply rename the "Common" project to "C'Mon." We can think about the >>> details of what exactly these project splits mean later. >>> >>> Please let me know what you think. >>> >>> Best, >>> Aaron >>> > >
-
Re: Proposal: Further Project Split(s)Allen Wittenauer 2011-04-01, 19:35
On Apr 1, 2011, at 11:41 AM, Konstantin Boudnik wrote: > On Fri, Apr 1, 2011 at 08:26, Nigel Daley <[EMAIL PROTECTED]> wrote: >> -1+2. This could potentially allow us to replace Jenkins with Hadoop for our build and test infrastructure. That would be awesome! > > Has anyone checked a calendar lately? No. My calendar application's map tasks are stuck behind our PYMK workflow.
-
Re: Proposal: Further Project Split(s)Konstantin Boudnik 2011-04-01, 19:46
And I tend to believe to all sort of stuff on this particular day
because this happens to be my birthday ;( On Fri, Apr 1, 2011 at 12:35, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > > On Apr 1, 2011, at 11:41 AM, Konstantin Boudnik wrote: > >> On Fri, Apr 1, 2011 at 08:26, Nigel Daley <[EMAIL PROTECTED]> wrote: >>> -1+2. This could potentially allow us to replace Jenkins with Hadoop for our build and test infrastructure. That would be awesome! >> >> Has anyone checked a calendar lately? > > > No. My calendar application's map tasks are stuck behind our PYMK workflow. > > |