Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # general >> Fwd: [VOTE] Shall we adopt the "Defining Hadoop" page

Jeff Hammerbacher 2011-06-20, 08:28
Andrew Purtell 2011-06-20, 16:39
Copy link to this message
Re: Fwd: [VOTE] Shall we adopt the "Defining Hadoop" page
Great summary Andrew.

I would add one more precipitating factor here.  That is the arrival of a
number of products which are very close to the Apache version of Hadoop but
for which there is no good and widely accepted terminology that gives proper
credit to their lineage while making clear the distinction from bit-for-bit
copies of official Apache releases.

Some products are analogous to hive, pig or hbase in that they are
independent systems that run ON hadoop (or close equivalents).  These have
no terminology problem because these products aren't hadoop, but rather use

Other products contain Hadoop internally as a critical component but do not
necessarily expose Hadoop capabilities to the end user (I can't name these
products, but they exist).  These products have little nomenclatural
difficulty because the powerd-by-Hadoop description fits very well.

The products with the terminology problem are the ones that are add either
curation and packaging (Cloudera) or substantial additional performance
enhancing components (MapR).  These products are upwardly compatible with
Apache Hadoop in that programs that run on Hadoop will very probably run on
these Hadoop-like systems.  The problem is that there is no good term for
these products.  They may even contain components that are bit-for-bit
identical to the same components for Apache releases.  It is fair to say
that these are not Apache released software, but it is also fair to say that
there ought to be a better name for the class of these products.

On Mon, Jun 20, 2011 at 4:39 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:

> Hadoop I think needs to be more careful. What triggered this discussion is
> the arrival of new players releasing products they call Hadoop but
> containing severe changes the community, by way of the ASF umbrella we all
> work under, had nothing to do with designing or developing. And some of
> these are being open sourced as a Hadoop. There is no Linus here. Which of
> these is _the_ Hadoop? As a would-be contributor, which should I select?