Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [DISCUSS] - YARN as a sub-project of Apache Hadoop

Copy link to this message
Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
Mattmann, Chris A 2012-07-26, 15:00
Hey Aaron,

On Jul 25, 2012, at 11:16 PM, Aaron T. Myers wrote:

> On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) <
>> I realize I'm asking a hard question here: why *aren't* they separate
>> projects? What's the barrier? They seem
>> to be operating that way (and have been for a while). And I don't see how
>> Hadoop still couldnt' move along at
>> a fair clip with them as official TLPs themselves.
> I'm opposed to this if for no other reason than that it makes it difficult
> to make logically-individual changes which span the projects. As much as we
> might like it to be the case, it is not presently true that Common is so
> independent and stable from HDFS and MR/YARN that Common could reasonably
> be separate and have its own release schedule. I think this view is
> supported by the fact that we once had separate SVN repos for Common, HDFS,
> and MR, but we undid that because having to make coordinated commits across
> the several repos, and the complex build dependencies it induced, was too
> onerous.

Fair enough.

> The main reason I'm opposed to making them separate projects is that I
> don't think their internal interfaces are so stable that they could
> reasonably release independently.
> Though we've been pretty good at
> maintaining the stability of the external interfaces, we routinely make
> changes in the internal interfaces of Common/HDFS/MR that make the projects
> fairly tightly-coupled. Note that Arun's proposal specifically calls out
> that the sub-projects would still release together, which I support.

Sub projects are not a good thing at Apache. Well, "official" sub projects
that have their own committees, mailing lists, etc. You guys aren't talking
about sub projects (though you call them that) -- in reality you are talking
about *products* that the Apache Hadoop PMC releases. They may have
different names, be on different release schedules, have different mailing
lists even (which I still is not the right thing to do), but they are not *projects*.

I guess that's one thing that got me confused with Arun's original proposal:
in it there is talk of different sub-*projects* and making YARN a new sub-*project*
and discussion of it and Map Reduce and each attracting a diverse (implied: different)

If you guys are talking about *products* that themselves have different *communities*
then pretty much at Apache those are different *projects*.

If you are talking about different *products* that themselves have *the same community*
who releases those *products* then we are talking about a single *project* at Apache
that has different *products* that it releases (am I confusing you yet?) :)

Regardless, I guess in the end what I was questioning was that if you look
at the net of Arun's proposal minus Project Dependencies (which is really
code level things -- at Apache code is one thing, but we are dealing with
*communities*), and Release Cycles (no changes), the proposal boils down

1. Creating separate mailing lists for YARN
2. an svn mv command

My advice on #1 was be careful on splitting mailing lists, I've seen that cause trouble
(even before Hadoop existed and in other Apache projects I've cited), and then on #2,
why not execute the svn mv command and just move forward? You all are on the Hadoop
PMC and I assume trust Arun (and that he trusts you guys since you've given each other
the commit bit), so move forward on it.

As for #2, your point about being happy Arun brought this up as it would have
impact on the build cycle/etc etc., that makes sense and is a good reason to DISCUSS it.
> Yeah I know you are doing great -- my point is, technically, what consensus
>> is required -- you develop code at Apache
>> as individuals -- code is committed -- as are patches, etc. The PMC is
>> there to regulate that, but it sounds like code wise
>> you are proposing an svn mv command -- do you need an email thread to
>> discuss that? Why not just do it, and if someone

Yep thanks. This is good validation for #2 above then.
Yeah, that's cool. I do the same myself and that makes sense. It just
seemed like a formal proposal to create a project, minus the creating
project thing, so I thought I'd ask.


Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
WWW:   http://sunset.usc.edu/~mattmann/
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA