Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Defining Compatibility


Copy link to this message
-
Re: Defining Compatibility
FWIW the FileSystemContractBaseTest class and the FileContext*BaseTest
classes (and their concrete subclasses) are probably the closest thing
we have to compatibility tests for FileSystem and FileContext
implementations in Hadoop.

Tom

On Mon, Jan 31, 2011 at 7:59 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> On 31/01/11 14:32, Chris Douglas wrote:
>>
>> Steve-
>>
>> It's hard to answer without more concrete criteria. Is this a
>> trademark question affecting the marketing of a product? A
>> cross-compatibility taxonomy for users? The minimum criteria to
>> publish a paper/release a product without eye-rolling? The particular
>> compatibility claims made by a system will be nuanced and specific; a
>> runtime that executes MapReduce jobs as they would run in Hadoop can
>> simply make that claim, whether it uses parts of MapReduce, HDFS, or
>> neither.
>
> No, I'm thinking more about what large scale tests are needed to be run
> against the codebase before you can say "it works", and then how to say some
> changes means that it still works.
>
>>
>> For the various distributions "Powered by Apache Hadoop," one would
>> assume that compatibility will vary depending on the featureset and
>> the audience. A distribution that runs MapReduce applications
>> as-written for Apache Hadoop may be incompatible with a user's
>> deployed metrics/monitoring system. Some random script to scrape the
>> UI may not work. The product may only scale to 20 nodes. Whether these
>> are "compatible with Apache Hadoop" is awkward to answer generally,
>> unless we want to define the semantics of that phrase by policy.
>>
>> To put it bluntly, why would we bother to define such a policy? One
>> could assert that a fully-compatible system would implement all the
>> public/stable APIs as defined in HADOOP-5073, but who would that help?
>> And though interoperability is certainly relevant to systems built on
>> top of Hadoop, is there a reason the Apache project needs to be
>> involved in defining the standards for compatibility among them?
>
> Agreed, I'm just thinking about namings and definitions. Even with the
> stable/unstable internal/external split, there's still the question as to
> what the semantics of operations are, both explicit (this operation does X)
> and implicit (and it takes less than Y seconds to do it). It's those
> implicit things that always catch you out (indeed, they are the argument
> points in things like Java and Java EE compatibility test kits)
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB