-Re: Fwd: [VOTE] Shall we adopt the "Defining Hadoop" page
Andrew Purtell 2011-06-20, 16:39
First, apologies for removing most of your argument for clarity. Readers can find it in the general@ archives I am sure.
> Lastly, I'd love to learn more about how other prominent open source
> projects have approached this issue. If you have any knowledge about
> how Linux handled the use of its trademark, please add your
> thoughts to
> Because Apache Hadoop is a kernel technology, similar to Linux, I
> suspect there are many useful lessons to learn. Or at least crazy
> email threads to read.
I would argue the concern about trademark has an additional dimension here, and perhaps a fairly core additional motivation to protect, because these are open source projects. The mention of Linux helps to illustrate it.
The obvious difference between Hadoop and Linux is Linux has a universally recognized clear hierarchy with a single -- and exceptional, and quickly and forcefully opinionated -- authority at the top. For Linux, the power to define Linux rests obviously with Linus. Regarding Hadoop, the power to do anything, including define what is Hadoop, is diffuse.
For would-be open source participants who want to contribute to the Linux kernel, the canonical source of the Linux kernel is clearly Linus' tree and you want your contribution to end up there. He is the authority. Linux will always be defined by Linus until he is gone. (That is a long term problem for Linux of course.) It is a benevolent dictatorship that perhaps uniquely works, allowing enough contributors to see the fruits of their labor to sustain it while simultaneously maintaining a strong identity.
Hadoop has no equivalent.
Linux, for now at least, can be quite liberal in how the Linux mark is used because of how its identity as a project is defined, therefore its ability to attract contributions.
Hadoop I think needs to be more careful. What triggered this discussion is the arrival of new players releasing products they call Hadoop but containing severe changes the community, by way of the ASF umbrella we all work under, had nothing to do with designing or developing. And some of these are being open sourced as a Hadoop. There is no Linus here. Which of these is _the_ Hadoop? As a would-be contributor, which should I select?
Already we have some issues. In some cases I'd rather contribute to Cloudera sources because at least I know my contribution to CDH will see a timely release.
Furthermore, I believe the extent to which users see value in ASF Hadoop, and have a clear definition of what ASF Hadoop is, will be correlated with the extent to which the ASF can attract enough contributions to Hadoop to sustain innovation against competing technologies.
The open source value proposition "I contribute to Hadoop" impacts the long term survival of the project. Individuals and organizations are both motivated by this, for various reasons.