Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project


Copy link to this message
-
Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Todd Lipcon 2012-08-31, 16:59
On Thu, Aug 30, 2012 at 11:50 PM, Mattmann, Chris A (388J)
<[EMAIL PROTECTED]> wrote:
> Hi Andrew,
>
> On Aug 30, 2012, at 11:42 PM, Andrew Purtell wrote:
>
>> If Apache Hadoop -- as an umbrella or sum of its parts -- isn't practical
>> to develop end applications or downstream projects on, the community will
>> disappear.
>
> Sure, the end-user community might disappear, but the point I'm trying to make is
> that the community is more than that. It's developers that build code together
> ("community over code"); it's folks who write documentation who are part of the
> project's committee of folks working together to develop software for the public
> good at this Foundation. It's folks who write unit tests as part of that.  It's also people
> that fly by on the lists and that need help; or that may throw up a patch, or
> whatever. It's other members of the Apache Software Foundation that are
> charged with caring and giving a rip about the Foundation's projects.

Well, speaking as one of the developer community who hasn't been a
traditional user of Hadoop since my previous job in 2008: if the end
user community started to languish, I (and 80% of the other most
involved contributors) would probably stop working on the project
pretty quickly. We're here because a user community exists, which
funds our employers, who fund us.

Another point I'll make is that I've talked to a number of former
contributors (from the 0.20 days) who pretty much stopped contributing
because of the code base churn around the prior project split. It
became too much effort to forward and back port patches from their
internal branches, so their cost/reward tradeoff dipped negative. So
there are real community costs associated with what seem like
"technical" changes.

I don't know who came up with the original "community over code"
mantra, or whether the ASF truly thinks these are hard and fast rules
rather than principles and guidelines. But, if I may be so bold, I
would much prefer the mantra of "community around code". Without the
code at the center of any project, we'd just be a bunch of nerds
shooting the shit. The code's what ties us together, and the pressure
of keeping a centralized codebase that we can all feel good about
shipping is what allows us to get past our differences and produce
high quality software.

The best reference I can find on apache.org is the Committer's FAQ:
http://www.apache.org/dev/committers.html where it says explicitly:

> Note: While there is not an official list, the following six principles have been cited as the core beliefs of The Apache Way:
> - collaborative software development
> - commercial-friendly standard license
> - consistently high quality software
> - respectful, honest, technical-based interaction
> - faithful implementation of standards
> - security as a mandatory feature

Maybe you disagree, but from my perspective, we're doing reasonably
well on all of them. You may not think there's much collaboration, but
in the last 2-3 weeks, I have collaborated on Hadoop-related work with
developers from Trend Micro, Facebook, Calxeda, Hortonworks, and
interacted with users from a much wider variety of organizations.

As Andrew said, I thought we were going along pretty well before this thread.
As for technical things we need to do to get to a feasible split: big
+1 that classpath pollution issues are near top of the list. We need a
reasonable classloader strategy, and I think Tom's OSGi stuff is a
good start in that direction. But it's going to be quite some time
before that's all integrated and pulled into dependent projects, etc.
So let's work on it but not be rash in our decisions.

-Todd
--
Todd Lipcon
Software Engineer, Cloudera