Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Defining Hadoop Compatibility -revisiting-


Copy link to this message
-
Re: Defining Hadoop Compatibility -revisiting-
On Thu, May 12, 2011 at 09:45, Milind Bhandarkar
<[EMAIL PROTECTED]> wrote:
> HCK and written specifications are not mutually exclusive. However, given
> the evolving nature of Hadoop APIs, functional tests need to evolve as

I would actually expand it to 'functional and system tests' because
latter are capable of validating inter-component iterations not
coverable by functional tests.

Cos

> well, and having them tied to a "current stable" version is easier to do
> than it is to tie the written specifications.
>
> - milind
>
> --
> Milind Bhandarkar
> [EMAIL PROTECTED]
> +1-650-776-3167
>
>
>
>
>
>
> On 5/11/11 7:26 PM, "M. C. Srivas" <[EMAIL PROTECTED]> wrote:
>
>>While the HCK is a great idea to check quickly if an implementation is
>>"compliant",  we still need a written specification to define what is
>>meant
>>by compliance, something akin to a set of RFC's, or a set of docs like the
>> IEEE POSIX specifications.
>>
>>For example, the POSIX.1c pthreads API has a written document that
>>specifies
>>all the function calls, input params, return values, and error codes. It
>>clearly indicates what any POSIX-complaint threads package needs to
>>support,
>>and what are vendor-specific non-portable extensions that one can use at
>>one's own risk.
>>
>>Currently we have 2 sets of API  in the DFS and Map/Reduce layers, and the
>>specification is extracted only by looking at the code, or (where the code
>>is non-trivial) by writing really bizarre test programs to examine corner
>>cases. Further, the interaction between a mix of the old and new APIs is
>>not
>>specified anywhere. Such specifications are vitally important when
>>implementing libraries like Cascading, Mahout, etc. For example, an
>>application might open a file using the new API, and pass that stream
>>into a
>>library that manipulates the stream using some of the old API ... what is
>>then the expectation of the state of the stream when the library call
>>returns?
>>
>>Sanjay Radia @ Y! already started specifying some the DFS APIs to nail
>>such
>>things down. There's similar good effort in the Map/Reduce and  Avro
>>spaces,
>>but it seems to have stalled somewhat. We should continue it.
>>
>>Doing such specs would be a great service to the community and the users
>>of
>>Hadoop. It provides them
>>   (a) clear-cut docs on how to use the Hadoop APIs
>>   (b) wider choice of Hadoop implementations by freeing them from vendor
>>lock-in.
>>
>>Once we have such specification, the HCK becomes meaningful (since the HCK
>>itself will be buggy initially).
>>
>>
>>On Wed, May 11, 2011 at 2:46 PM, Milind Bhandarkar
>><[EMAIL PROTECTED]
>>> wrote:
>>
>>> I think it's time to separate out functional tests as a "Hadoop
>>> Compatibility Kit (HCK)", similar to the Sun TCK for Java, but under ASL
>>> 2.0. Then "certification" would mean "Passes 100% of the HCK testsuite."
>>>
>>> - milind
>>> --
>>> Milind Bhandarkar
>>> [EMAIL PROTECTED]
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 5/11/11 2:24 PM, "Eric Baldeschwieler" <[EMAIL PROTECTED]> wrote:
>>>
>>> >This is a really interesting topic!  I completely agree that we need to
>>> >get ahead of this.
>>> >
>>> >I would be really interested in learning of any experience other apache
>>> >projects, such as apache or tomcat have with these issues.
>>> >
>>> >---
>>> >E14 - typing on glass
>>> >
>>> >On May 10, 2011, at 6:31 AM, "Steve Loughran" <[EMAIL PROTECTED]>
>>>wrote:
>>> >
>>> >>
>>> >> Back in Jan 2011, I started a discussion about how to define Apache
>>> >> Hadoop Compatibility:
>>> >>
>>> >>
>>>
>>>http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C4D
>>> >>[EMAIL PROTECTED]%3E
>>> >>
>>> >> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet
>>> >>
>>> >>
>>>
>>>>>http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_
>>>>>1
>>> .
>>> >>pdf
>>> >>
>>> >> It claims that their implementations are 100% compatible, even though
>>> >> the Enterprise edition uses a C filesystem. It also claims that both
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB