-Re: Compatibility in Apache Hadoop
Karthik Kambatla 2013-04-23, 20:09
On Tue, Apr 23, 2013 at 9:00 AM, Steve Loughran <[EMAIL PROTECTED]>wrote:
> On 22 April 2013 18:32, Eli Collins <[EMAIL PROTECTED]> wrote:
> > On Mon, Apr 22, 2013 at 5:42 PM, Steve Loughran <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > There's a separate issue that says "we make some guarantee that the
> > > behaviour of a interface remains consistent over versions", which is
> > > to do without some rigorous definition of what the expected behaviour
> > an
> > > implementation should be.
> > Good point, Steve. I've assumed the semantics of the API had to
> > respect the attribute (eg changing the semantics of FileSystem#close
> > would be an incompatible change, since this is a public/stable API,
> > even if the new semantics are arguably better). But you're right,
> > unless we've actually defined what the semantics of the APIs are it's
> > hard to say if we've materially changed them. How about adding a new
> > section on the page and calling that out explicitly?
> Maybe we should list which bits we consider both well specified and covered
> with tests that verify the implementations in our svn match that
> > In practice I think we'll have to take semantics case by case, clearly
> > define the semantics we care about better in the javadocs (for the
> > major end user-facing classes at least, calling out both intended
> > behavior and behavior that's meant to be undefined) and using
> > individual judgement elsewhere. For example, HDFS-4156 changed
> > DataInputStream#seek to throw an IOE if you seek to a negative offset,
> > instead of succeeding then resulting in an NPE on the next access.
> I'd seen that the DFS seek was the best implementation, but hadn't seen the
> cause. The other ones (especially the Buffered one that goes in front of
> most others) is much weaker
> > That's an incompatible change in terms of semantics, but not semantics
> > intended by the author, or likely semantics programs depend on.
> That's a key problem: what do people depend on? A lot of the junit tests
> depended on ordering of methods, after all
> > However if a change made FileSystem#close three times slower, this
> > perhaps a smaller semantic change (eg doesn't change what exceptions
> > get thrown) but probably much less tolerable for end users.
> You know that the blobstores all buffer their data so that
> 1. flush() is a no-op
> 2. the write takes place on close()
> #1 changes durability expectations, while #2 means the time to close() is
> O(data)*O(latency); P(fail) scales with time and distance, and as lots of
> code swallows exceptions on close, those failures may even miss.
> then there's the assumption that rename is atomic, which MapReduce depends
> > In any case, even if we get an 80% solution to the semantics issue
> > we'll probably be in good shape for v2 GA if we can sort out the
> > remaining topics. See any other topics missing? Once the overall
> > outline is in shape it make sense to annotate the page with the
> > current policy (if there's already consensus on one), and identifying
> > areas where we need to come up with a policy or are leaving TBD.
> > Currently this is a source of confusion for new developers, some
> > downstream projects and users.
> How about
> "semantic compatibility" : we strive to ensure that the behavior of APIs
> remains consistent over versions, though changes for correctness may result
> in changes in behavior That is: if you relied on something which we
> consider to be a bug, it may get fixed.
> We are in the process of specifying some APIs more rigorously, enhancing
> our test suites to verify compliance with the specification, effectively
> creating a formal specification for the subset of behaviors that can be
> easily tested. We welcome involvement in this process, from both users and
> implementors of our APIs.
+1. Thanks Steve.
Added Semantic compatibility to the wiki.
Added to the introduction as it applies to all compatibility.