Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> [DISCUSS] - YARN as a sub-project of Apache Hadoop


Copy link to this message
-
Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop
Hey Aaron,

On Jul 25, 2012, at 11:16 PM, Aaron T. Myers wrote:

> On Wed, Jul 25, 2012 at 7:30 PM, Mattmann, Chris A (388J) <
> [EMAIL PROTECTED]> wrote:
>
>> I realize I'm asking a hard question here: why *aren't* they separate
>> projects? What's the barrier? They seem
>> to be operating that way (and have been for a while). And I don't see how
>> Hadoop still couldnt' move along at
>> a fair clip with them as official TLPs themselves.
>>
>
> I'm opposed to this if for no other reason than that it makes it difficult
> to make logically-individual changes which span the projects. As much as we
> might like it to be the case, it is not presently true that Common is so
> independent and stable from HDFS and MR/YARN that Common could reasonably
> be separate and have its own release schedule. I think this view is
> supported by the fact that we once had separate SVN repos for Common, HDFS,
> and MR, but we undid that because having to make coordinated commits across
> the several repos, and the complex build dependencies it induced, was too
> onerous.

Fair enough.

>
> The main reason I'm opposed to making them separate projects is that I
> don't think their internal interfaces are so stable that they could
> reasonably release independently.
> Though we've been pretty good at
> maintaining the stability of the external interfaces, we routinely make
> changes in the internal interfaces of Common/HDFS/MR that make the projects
> fairly tightly-coupled. Note that Arun's proposal specifically calls out
> that the sub-projects would still release together, which I support.

Sub projects are not a good thing at Apache. Well, "official" sub projects
that have their own committees, mailing lists, etc. You guys aren't talking
about sub projects (though you call them that) -- in reality you are talking
about *products* that the Apache Hadoop PMC releases. They may have
different names, be on different release schedules, have different mailing
lists even (which I still is not the right thing to do), but they are not *projects*.

I guess that's one thing that got me confused with Arun's original proposal:
in it there is talk of different sub-*projects* and making YARN a new sub-*project*
and discussion of it and Map Reduce and each attracting a diverse (implied: different)
community.

If you guys are talking about *products* that themselves have different *communities*
then pretty much at Apache those are different *projects*.

If you are talking about different *products* that themselves have *the same community*
who releases those *products* then we are talking about a single *project* at Apache
that has different *products* that it releases (am I confusing you yet?) :)

Regardless, I guess in the end what I was questioning was that if you look
at the net of Arun's proposal minus Project Dependencies (which is really
code level things -- at Apache code is one thing, but we are dealing with
*communities*), and Release Cycles (no changes), the proposal boils down
to:

1. Creating separate mailing lists for YARN
2. an svn mv command

My advice on #1 was be careful on splitting mailing lists, I've seen that cause trouble
(even before Hadoop existed and in other Apache projects I've cited), and then on #2,
why not execute the svn mv command and just move forward? You all are on the Hadoop
PMC and I assume trust Arun (and that he trusts you guys since you've given each other
the commit bit), so move forward on it.

As for #2, your point about being happy Arun brought this up as it would have
impact on the build cycle/etc etc., that makes sense and is a good reason to DISCUSS it.
>
> Yeah I know you are doing great -- my point is, technically, what consensus
>> is required -- you develop code at Apache
>> as individuals -- code is committed -- as are patches, etc. The PMC is
>> there to regulate that, but it sounds like code wise
>> you are proposing an svn mv command -- do you need an email thread to
>> discuss that? Why not just do it, and if someone

Yep thanks. This is good validation for #2 above then.
Yeah, that's cool. I do the same myself and that makes sense. It just
seemed like a formal proposal to create a project, minus the creating
project thing, so I thought I'd ask.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [EMAIL PROTECTED]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB