Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Defining Hadoop Compatibility -revisiting-


Copy link to this message
-
Re: Defining Hadoop Compatibility -revisiting-
On Thu, May 12, 2011 at 09:45, Milind Bhandarkar
<[EMAIL PROTECTED]> wrote:
> HCK and written specifications are not mutually exclusive. However, given
> the evolving nature of Hadoop APIs, functional tests need to evolve as

I would actually expand it to 'functional and system tests' because
latter are capable of validating inter-component iterations not
coverable by functional tests.

Cos

> well, and having them tied to a "current stable" version is easier to do
> than it is to tie the written specifications.
>
> - milind
>
> --
> Milind Bhandarkar
> [EMAIL PROTECTED]
> +1-650-776-3167
>
>
>
>
>
>
> On 5/11/11 7:26 PM, "M. C. Srivas" <[EMAIL PROTECTED]> wrote:
>
>>While the HCK is a great idea to check quickly if an implementation is
>>"compliant",  we still need a written specification to define what is
>>meant
>>by compliance, something akin to a set of RFC's, or a set of docs like the
>> IEEE POSIX specifications.
>>
>>For example, the POSIX.1c pthreads API has a written document that
>>specifies
>>all the function calls, input params, return values, and error codes. It
>>clearly indicates what any POSIX-complaint threads package needs to
>>support,
>>and what are vendor-specific non-portable extensions that one can use at
>>one's own risk.
>>
>>Currently we have 2 sets of API  in the DFS and Map/Reduce layers, and the
>>specification is extracted only by looking at the code, or (where the code
>>is non-trivial) by writing really bizarre test programs to examine corner
>>cases. Further, the interaction between a mix of the old and new APIs is
>>not
>>specified anywhere. Such specifications are vitally important when
>>implementing libraries like Cascading, Mahout, etc. For example, an
>>application might open a file using the new API, and pass that stream
>>into a
>>library that manipulates the stream using some of the old API ... what is
>>then the expectation of the state of the stream when the library call
>>returns?
>>
>>Sanjay Radia @ Y! already started specifying some the DFS APIs to nail
>>such
>>things down. There's similar good effort in the Map/Reduce and  Avro
>>spaces,
>>but it seems to have stalled somewhat. We should continue it.
>>
>>Doing such specs would be a great service to the community and the users
>>of
>>Hadoop. It provides them
>>   (a) clear-cut docs on how to use the Hadoop APIs
>>   (b) wider choice of Hadoop implementations by freeing them from vendor
>>lock-in.
>>
>>Once we have such specification, the HCK becomes meaningful (since the HCK
>>itself will be buggy initially).
>>
>>
>>On Wed, May 11, 2011 at 2:46 PM, Milind Bhandarkar
>><[EMAIL PROTECTED]
>>> wrote:
>>
>>> I think it's time to separate out functional tests as a "Hadoop
>>> Compatibility Kit (HCK)", similar to the Sun TCK for Java, but under ASL
>>> 2.0. Then "certification" would mean "Passes 100% of the HCK testsuite."
>>>
>>> - milind
>>> --
>>> Milind Bhandarkar
>>> [EMAIL PROTECTED]
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 5/11/11 2:24 PM, "Eric Baldeschwieler" <[EMAIL PROTECTED]> wrote:
>>>
>>> >This is a really interesting topic!  I completely agree that we need to
>>> >get ahead of this.
>>> >
>>> >I would be really interested in learning of any experience other apache
>>> >projects, such as apache or tomcat have with these issues.
>>> >
>>> >---
>>> >E14 - typing on glass
>>> >
>>> >On May 10, 2011, at 6:31 AM, "Steve Loughran" <[EMAIL PROTECTED]>
>>>wrote:
>>> >
>>> >>
>>> >> Back in Jan 2011, I started a discussion about how to define Apache
>>> >> Hadoop Compatibility:
>>> >>
>>> >>
>>>
>>>http://mail-archives.apache.org/mod_mbox/hadoop-general/201101.mbox/%3C4D
>>> >>[EMAIL PROTECTED]%3E
>>> >>
>>> >> I am now reading EMC HD "Enterprise Ready" Apache Hadoop datasheet
>>> >>
>>> >>
>>>
>>>>>http://www.greenplum.com/sites/default/files/EMC_Greenplum_HD_DS_Final_
>>>>>1
>>> .
>>> >>pdf
>>> >>
>>> >> It claims that their implementations are 100% compatible, even though
>>> >> the Enterprise edition uses a C filesystem. It also claims that both