Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - Proposal: Further Project Split(s)


Copy link to this message
-
Re: Proposal: Further Project Split(s)
Mattmann, Chris A 2011-04-01, 17:06
LOL@Chris!!!

On Apr 1, 2011, at 1:57 AM, Chris Douglas wrote:

> Experience developing Hadoop has shown that we not only need to
> partition our projects for more active releases, but we also should
> explore speculative project splits. For this, a Hadoop.next() project
> should track the development of a project scheduler that can partition
> the Hadoop subprojects, possibly running a second version of a
> subproject in parallel. Downstream subprojects and TLPs automatically
> accept whichever releases first as a dependency. Implementation should
> combine ant, ivy, maven, and at least one legacy Hadoop build tool (to
> be written).
>
> Of course, not all of these subprojects will succeed. When one fails
> (or is too slow with its project reports), the project scheduler will
> be responsible for respawning it in the Incubator.
>
> The project scheduler will, of course, be pluggable. -C
>
> On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers <[EMAIL PROTECTED]> wrote:
>> Hello Hadoop Community,
>>
>> Given the tremendous positive feedback we've all had regarding the HDFS,
>> MapReduce, and Common project split, I'd like to propose we take the next
>> step and further separate the existing projects.
>>
>> I propose we begin by splitting the MapReduce project into separate "Map"
>> and "Reduce" sub-projects. This will provide us the opportunity to tease out
>> the complex interdependencies between "map" and "reduce" that exist today,
>> to encourage us to write more modular and isolated code, which should speed
>> releases. This will also aid our users who exclusively run map-only or
>> reduce-only jobs. These are important use-cases, and so should be given high
>> priority.
>>
>> Given that these two portions of the existing MapReduce project share a
>> great deal of code, we will likely need to release these two new projects
>> concurrently at first, but the eventual goal should certainly be to be able
>> to release "Map" and "Reduce" independently. This seems intuitive to me,
>> given the remarkable recent advancements in the academic community regarding
>> "reduce," while the research coming out of the "map" academics has largely
>> stagnated of late.
>>
>> If this proposal is accepted, and it has the success I think it will, then
>> we should strongly consider splitting the other two projects as well. My gut
>> instinct is that we should split "HDFS" into "HD" and "FS" sub-projects, and
>> simply rename the "Common" project to "C'Mon." We can think about the
>> details of what exactly these project splits mean later.
>>
>> Please let me know what you think.
>>
>> Best,
>> Aaron
>>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [EMAIL PROTECTED]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++