Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project


Copy link to this message
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Mattmann, Chris A 2012-08-30, 13:51
Hi Konstantin,

On Aug 30, 2012, at 3:12 AM, Konstantin Shvachko wrote:

> On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J)
> <[EMAIL PROTECTED]> wrote:
>> OK I lied and said I wouldn't reply :)
>
> Long thread. I just picked a random Chris's (as the initiator) email to reply.
>
> Chris,
> You are basically saying there's been a history of community problems
> in Hadoop project,
> and proposing a technical solution to split the project by replicating
> the source base under three new names,
> implying that this will solve the community problems we (the Hadoop
> community) are facing.

Well actually the replication of the source code is just a small part of
what I was proposing (and one that I don't really care about, and that
isn't crucial to what I'm saying). The breakage up of the project into
individuals that actually share similar views, that can reach consensus on
things (besides technical issues), and that work in the Apache way is what
I was really proposing.

>
> I see several issues.
>
> 1. There are other ways to split the project.
> We essentially have a "natural" split of the project already in place.
> Hadoop 1, Hadoop 2, Hadoop 0.23, the Trunk
> are in a sense competing projects by themselves, with own contributors
> and release cycles.

+1, that's a great split too. I'm not wed to simply splitting the project along
components, or systems or whatever.

Whatever makes sense to get communities of people working together
at Apache is what I'm after. Community != technical.

>
> 2. From technical (not community) viewpoint your "svn copy" is an ugly
> approach,
> [..snip...]

+1, totally is ugly -- I used it for illustration in the hope that the Hadoop technical
experts could come up with a better one and stop using it as an excuse
to fix the community problems.

>
> 3. I am as skeptical as Todd that the community problems will be
> solved by simply TLP-ing the three projects.
> Two years ago Hadoop was in crises as vendors were producing their own
> releases calling it Hadoop.
> I think this was solved, but "poor community behavior" and contentions
> remained, embrace them or not.

Vendors still produce their own releases on top of Hadoop, whether they
call them Hadoop or not. That problem isn't fixed, and won't be fixed -- it's
grown too much.

>
> 4. Having said the above, separating the projects seems reasonable.
> (See timing though)
> HDFS will inevitable have to inherit and maintain most of Common.
> Totally understand frustration of people who just put a huge effort
> into merging
> the sources back under common root.

Me too which is why I'm not urging for this or that, or how to solve these
types of things. I'm not sure, but I also know that it's most important
to get projects that understand how things work here at Apache.

>
> 5. Timing is important.
> Waiting until Hadoop 2 is stable as Arun suggested earlier would
> probably be too long.
> Doing it next week, without discussing and solving technical issue
> listed in the thread would be premature.
> I think Hadoop 0.23.3 release backed by Yahoo production has a
> potential to become
> the next stable version, letting the project to move ahead off the
> four year old code base.
> We should help that happen first, and do necessary preparations for
> the split in the mean time.

Sounds reasonable to me.

Thanks for your feedback.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [EMAIL PROTECTED]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++