Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [VOTE] Shall we adopt the "Defining Hadoop" page

Copy link to this message
Re: [VOTE] Shall we adopt the "Defining Hadoop" page
Owen O'Malley 2011-06-15, 02:45

On Jun 14, 2011, at 5:48 PM, Eli Collins wrote:

> Wrt derivative works, it's not clear from the document, but I think we
> should explicitly adopt the policy of HTTPD and Subversion that
> backported patches from trunk and security fixes are permitted.

Actually, the document is extremely clear that only Apache releases may be called Hadoop.

There was a very long thread about why the rapidly expanding Hadoop-ecosystem is leading to at lot of customer confusion about the different "versions" of Hadoop. We as the Hadoop project don't have the resources or the necessary compatibility test suite to test compatibility between the different sets of cherry picked patches. We also don't have time to ensure that all of the 1,000's of patches applied to 0.20.2 in each of the many (10? 15?) different versions have been committed to trunk. Futhermore, under the Apache license, a company Foo could claim that it is a cherry pick version of Hadoop without releasing their source code that would enable verification.

In summary,
  1. Hadoop is very successful.
  2. There are many different commercial products that are trying to use the Hadoop name.
  3. We can't check or enforce that the cherry pick versions are following the rules.
  4. We don't have a TCK like Java does to validate new versions are compatible.
  5. By far the most fair way to ensure compatibility and fairness between companies is that only Apache Hadoop releases may be called Hadoop.

That said, a package that includes a small number (< 3) of security patches that haven't been released yet doesn't seem unreasonable.

-- Owen