Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Copy link to this message
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Hi Bobby,

On Aug 29, 2012, at 8:17 AM, Robert Evans wrote:

> I personally am for splitting up the projects.  I think there is a lot of
> potential that each of the projects could have on their own, and I expect
> to see them evolve in new and interesting ways when the projects are not
> tied directly together.
> But, in order to get there we need to address the issues that made the
> first split attempt fail.  

Sorry I snipped the above but mainly I just don't buy the argument that
there is a bunch of technical things that *block* splitting the projects.

Today, right now, I could propose a new Incubator project, and call it
BoooDoopADoop. I could add 5-7 (or 4) people that I think I would work
well with. I could invite others to join in the Incubator as part of the
initial PPMC list and committer list. We could write in our proposal
that the existing Hadoop community is technically amazing, but over time
has been mired by a bunch of community issues and we'd like to take
our crack at the source code in a brand new Apache project called

Then for the code portion of the Incubator proposal, I could say, I will
svn copy all of Hadoop into BooDoopADoop and then start from there.

So, given that I could do that (as could others), I would also have to
readily be prepared for the community bad-will and general ASF
bad-will that may cause. It may not cause ASF bad-will, b/c in general
the foundation doesn't care about competing projects or technologies.
It does care about splintering communities and the like though. Moreover,
beyond the Foundation concerns, I would also have to concern myself
with pissing you guys off, and all the downstream organizations and
companies and individuals that are part of the Hadoop ecosystem
that may be pissed off about the way we injected code into
BooDoopADoop. But again, nothing stopping me from doing that.

I'd like to point out in the above scenario, I don't have to worry about
releasing schedules, and this, or that, and the other. Or APIs, or whatever.
I have BooDoopADoop, and so does the new community around it in the
Incubator, and we simply "go". Then, if others upstream, or downstream
find BooDoopADoop useful, they take it, and then incorporate it into
their project. Perhaps Hadoop HDFS finds our improvements to BooDoopADoop
and its distributed file system better and perhaps we did some Maven magic
and made our jar file better or more attractive to use and it saved Hadoop HDFS
coding, and time and whatever. So Hadoop HDFS integrates it.

See how this could work?

So, take me out of BooDoopADooop and replace that with the Hadoop
PMC, and the specific subsets of you guys that are actually really distinct
PMC members of distinct communities living within the Hadoop ecosystem.
Sure you want to technically work together on releases, and APIs, and whatever,
but those are, *inter-community* issues, more so than *intra-community* across
the Foundation. Sure, it's good to try and coordinate, b/c you guys all have $dayjobs,
and the software you build at those $dayjobs is contributed upstream into the
ASF, and then others depend on it (and then others downstream of the ASF and
even downstream of your companies, depend on it, and so on and so forth). However,
as far as the foundation is concerned, communities, and projects (1:1 ideally)
coordinate releases on an inter-community-level, not intra-*. the intra-* is usually
just icing and way more difficult.
> As part of this we also need to have a clear set of rules about what it
> takes to become a committer or PMC member for the new projects when they
> split off.  I am fine with all committers become PMC members,

+1 me too, and your suggestion below about "if we merge..." is one option
to doing so. But there could be others and discussing them and putting
them up on a list is probably a good idea.

I would honestly suggest someone(s) taking a stab at the lists of the new
PMC members for the new TLPs and then putting something out there,
and then -'ing people or adding them, as needed.

And yes, I fully agree, that the PMC lists should not simply be the
full Hadoop PMC per new TLP -- then we've just replicated the inherent
problem 3x over instead of 1x over :)

However, I don't know the ins and outs enough of who those lists should
be for HDFS, MR and YARN. I bet you guys do though, so someone, step up
and throw something out there for others to shoot down....errr I mean improve! :)


See my BooDoopADoop. I don't think that someone in new TLP X wanting
to make a change in their copy of common will matter to TLP Y. It shouldn't.
It *can*, over time, if there is coordination between X and Y, but it doesn't
have to. Get what I mean?

This is *not* a technical issue :) This is a community issue. It's independent
of the technical issues. This is about how to fix the community issues.

But yes, if you guys want to release some upcoming version first or whatever
fine, and dandy if the community agrees, but it shouldn't be a gate to fixing
community issues.

This happens in the Incubator all the time. The big question with a project
releasing and then having a graduation VOTE near that release (before or
after) -- do we wait to graduate? I'm always a fan of just moving forward on
graduation b/c it's independent of the technical stuff.
Dealing with Hadoop technical problems is probably not my forte anymore
(if it ever was : ) ). I'm here as a Foundation member trying to help
with the community problems.

In the end, forking is what you guys should do :) You should just do it
at Apache. "Fork" the current Hadoop uber project into the actual communities
that actually exist. You can fork directly out as TLPs, or incubate the forks.
But doing it here would be great :)
Thanks for your thoughts Bobby. Hope that explains where I am coming



Chris Mattmann, Ph.D.