Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Copy link to this message
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Konstantin Shvachko 2012-08-30, 10:12
On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J)
> OK I lied and said I wouldn't reply :)

Long thread. I just picked a random Chris's (as the initiator) email to reply.

You are basically saying there's been a history of community problems
in Hadoop project,
and proposing a technical solution to split the project by replicating
the source base under three new names,
implying that this will solve the community problems we (the Hadoop
community) are facing.

I see several issues.

1. There are other ways to split the project.
We essentially have a "natural" split of the project already in place.
Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk
are in a sense competing projects by themselves, with own contributors
and release cycles.

2. From technical (not community) viewpoint your "svn copy" is an ugly
as it creates a lot of code duplication and will result in a
maintenance nightmare or / and
will require many man-months to fix. My point is that you cannot
neglect "technical issues" when you solve community problems.

3. I am as skeptical as Todd that the community problems will be
solved by simply TLP-ing the three projects.
Two years ago Hadoop was in crises as vendors were producing their own
releases calling it Hadoop.
I think this was solved, but "poor community behavior" and contentions
remained, embrace them or not.

4. Having said the above, separating the projects seems reasonable.
(See timing though)
HDFS will inevitable have to inherit and maintain most of Common.
Totally understand frustration of people who just put a huge effort
into merging
the sources back under common root.

5. Timing is important.
Waiting until Hadoop 2 is stable as Arun suggested earlier would
probably be too long.
Doing it next week, without discussing and solving technical issue
listed in the thread would be premature.
I think Hadoop 0.23.3 release backed by Yahoo production has a
potential to become
the next stable version, letting the project to move ahead off the
four year old code base.
We should help that happen first, and do necessary preparations for
the split in the mean time.