-Re: Compatibility in Apache Hadoop
Alejandro Abdelnur 2013-04-23, 18:50
Or with a twist, why not break/consolidate things as follows?
hdfs CLIENT IMPL
hdfs SERVER IMPL
<other filesystems> CLIENT
yarn CLIENT IMPL
yarn SERVER IMPL
IMO, this would help significantly to reduce dependency hell (like bringing
servlet, jetty JAR to a hadoop client app).
On Tue, Apr 23, 2013 at 11:32 AM, Andrew Purtell <[EMAIL PROTECTED]>wrote:
> At the risk of hijacking this conversation a bit, what do you think of the
> notion of moving interfaces like Seekable and PositionedReadable into a new
> foundational Maven module, perhaps just for such interfaces that define and
> tag support for core semantics, as their details are better defined and
> documented? I was involved in a discussion today considering factoring out
> the codecs so other ecosystem projects might pull in only codec code.
> Similar to how hadoop-auth is slender and has a useful servlet filter
> implementing SPEGNO authentication, and so it is pulled into various
> places, and can even be used with Hadoop 1. The only thing preventing a
> clean separation of codecs like this is imports of Seekable and
> PositionedReadable. But these define behavior, they don't implement it.
> On Tue, Apr 23, 2013 at 9:00 AM, Steve Loughran <[EMAIL PROTECTED]
> > On 22 April 2013 18:32, Eli Collins <[EMAIL PROTECTED]> wrote:
> > > On Mon, Apr 22, 2013 at 5:42 PM, Steve Loughran <
> [EMAIL PROTECTED]>
> > > wrote:
> > >
> > > >
> > > > There's a separate issue that says "we make some guarantee that the
> > > > behaviour of a interface remains consistent over versions", which is
> > hard
> > > > to do without some rigorous definition of what the expected behaviour
> > of
> > > an
> > > > implementation should be.
> > >
> > >
> > > Good point, Steve. I've assumed the semantics of the API had to
> > > respect the attribute (eg changing the semantics of FileSystem#close
> > > would be an incompatible change, since this is a public/stable API,
> > > even if the new semantics are arguably better). But you're right,
> > > unless we've actually defined what the semantics of the APIs are it's
> > > hard to say if we've materially changed them. How about adding a new
> > > section on the page and calling that out explicitly?
> > >
> > +1.
> > Maybe we should list which bits we consider both well specified and
> > with tests that verify the implementations in our svn match that
> > specification.
> > >
> > > In practice I think we'll have to take semantics case by case, clearly
> > > define the semantics we care about better in the javadocs (for the
> > > major end user-facing classes at least, calling out both intended
> > > behavior and behavior that's meant to be undefined) and using
> > > individual judgement elsewhere. For example, HDFS-4156 changed
> > > DataInputStream#seek to throw an IOE if you seek to a negative offset,
> > > instead of succeeding then resulting in an NPE on the next access.
> > >
> > I'd seen that the DFS seek was the best implementation, but hadn't seen
> > cause. The other ones (especially the Buffered one that goes in front of
> > most others) is much weaker
> > > That's an incompatible change in terms of semantics, but not semantics
> > > intended by the author, or likely semantics programs depend on.
> > >
> > That's a key problem: what do people depend on? A lot of the junit tests
> > depended on ordering of methods, after all
> > > However if a change made FileSystem#close three times slower, this
> > > perhaps a smaller semantic change (eg doesn't change what exceptions
> > > get thrown) but probably much less tolerable for end users.
> > >
> > You know that the blobstores all buffer their data so that
> > 1. flush() is a no-op
> > 2. the write takes place on close()
> > #1 changes durability expectations, while #2 means the time to close() is