Experience developing Hadoop has shown that we not only need to
partition our projects for more active releases, but we also should
explore speculative project splits. For this, a Hadoop.next() project
should track the development of a project scheduler that can partition
the Hadoop subprojects, possibly running a second version of a
subproject in parallel. Downstream subprojects and TLPs automatically
accept whichever releases first as a dependency. Implementation should
combine ant, ivy, maven, and at least one legacy Hadoop build tool (to
Of course, not all of these subprojects will succeed. When one fails
(or is too slow with its project reports), the project scheduler will
be responsible for respawning it in the Incubator.
The project scheduler will, of course, be pluggable. -C
On Fri, Apr 1, 2011 at 1:19 AM, Aaron T. Myers <[EMAIL PROTECTED]> wrote:
> Hello Hadoop Community,
> Given the tremendous positive feedback we've all had regarding the HDFS,
> MapReduce, and Common project split, I'd like to propose we take the next
> step and further separate the existing projects.
> I propose we begin by splitting the MapReduce project into separate "Map"
> and "Reduce" sub-projects. This will provide us the opportunity to tease out
> the complex interdependencies between "map" and "reduce" that exist today,
> to encourage us to write more modular and isolated code, which should speed
> releases. This will also aid our users who exclusively run map-only or
> reduce-only jobs. These are important use-cases, and so should be given high
> Given that these two portions of the existing MapReduce project share a
> great deal of code, we will likely need to release these two new projects
> concurrently at first, but the eventual goal should certainly be to be able
> to release "Map" and "Reduce" independently. This seems intuitive to me,
> given the remarkable recent advancements in the academic community regarding
> "reduce," while the research coming out of the "map" academics has largely
> stagnated of late.
> If this proposal is accepted, and it has the success I think it will, then
> we should strongly consider splitting the other two projects as well. My gut
> instinct is that we should split "HDFS" into "HD" and "FS" sub-projects, and
> simply rename the "Common" project to "C'Mon." We can think about the
> details of what exactly these project splits mean later.
> Please let me know what you think.