-Fwd: Compatibility in Apache Hadoop
Eli Collins 2013-04-23, 01:32
On Mon, Apr 22, 2013 at 5:42 PM, Steve Loughran <[EMAIL PROTECTED]> wrote:
> On 22 April 2013 14:00, Karthik Kambatla <[EMAIL PROTECTED]> wrote:
> > Hadoop devs,
> > This doc does not intend to propose new policies. The idea is to have one
> > document that outlines the various compatibility concerns (lots of areas
> > beyond API compatibility), captures the respective policies that exist, and
> > if we want to define policies for the items where it’s not clear we have
> > something to iterate on.
> > The first draft just lists the types of compatibility. In the next step, we
> > can add existing policies and subsequently work towards policies for
> > others.
> I don't see -yet- a definition of compatible at the API signature level vs
> semantics level.
> The @ interface attributes say "these methods are
> internal/external/stable/unstable (there's also @VisibleForTesting,that
> comes out of guava (yes?).
> There's a separate issue that says "we make some guarantee that the
> behaviour of a interface remains consistent over versions", which is hard
> to do without some rigorous definition of what the expected behaviour of an
> implementation should be.
Good point, Steve. I've assumed the semantics of the API had to
respect the attribute (eg changing the semantics of FileSystem#close
would be an incompatible change, since this is a public/stable API,
even if the new semantics are arguably better). But you're right,
unless we've actually defined what the semantics of the APIs are it's
hard to say if we've materially changed them. How about adding a new
section on the page and calling that out explicitly?
In practice I think we'll have to take semantics case by case, clearly
define the semantics we care about better in the javadocs (for the
major end user-facing classes at least, calling out both intended
behavior and behavior that's meant to be undefined) and using
individual judgement elsewhere. For example, HDFS-4156 changed
DataInputStream#seek to throw an IOE if you seek to a negative offset,
instead of succeeding then resulting in an NPE on the next access.
That's an incompatible change in terms of semantics, but not semantics
intended by the author, or likely semantics programs depend on.
However if a change made FileSystem#close three times slower, this
perhaps a smaller semantic change (eg doesn't change what exceptions
get thrown) but probably much less tolerable for end users.
In any case, even if we get an 80% solution to the semantics issue
we'll probably be in good shape for v2 GA if we can sort out the
remaining topics. See any other topics missing? Once the overall
outline is in shape it make sense to annotate the page with the
current policy (if there's already consensus on one), and identifying
areas where we need to come up with a policy or are leaving TBD.
Currently this is a source of confusion for new developers, some
downstream projects and users.