Hi Arun, et. al.,
I hope you don't mind a non-contributor butting in here. I'm currently a
Hadoop administrator and former application developer (non-hadoop).
regarding GA release changes, I think Arun has got a lot of good ideas here.
I think it's better to add new features via new flags, parameters, etc.
and deprecate "abandon" or "bad" defaults, values, etc. At the rate Hadoop
is changing, I think you could Deprecate in GA 0.30 and change defaults in
As a user that would allow me to upgrade to a new GA version without
significant changes to my config. As we are ready to introduce new
features, we could Add the required changes to configs.
Please no changes that require me to "migrate" date between dot releases.
I fully expect that applications that run CentOS 6.2 will run on 6.3 with
no problems. CentOS 5.6 to 6.3 is another matter, as expected.
As it stands, deployed on Hadoop 1.x in prod and plan to test 2.x for
several months before upgrading.
I know you guys are excited about all of the cool improvements you're
making. Just try to remember Hadoop adoption is growing by leaps and
bounds, breaking things for the sake of "better" is not always good for the
Just my $0.02
On Wed, Jan 30, 2013 at 8:10 PM, Arun C Murthy <[EMAIL PROTECTED]> wrote:
> The discussions in HADOOP-9151 were related to wire-compatibility. I think
> we all agree that breaking API compatibility is not allowed without
> deprecating them first in a prior major release - this is something we have
> followed since hadoop-0.1.
> I agree we need to spell out what changes we can and cannot do *after* we
> go GA, for e.g.:
> # Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
> # Do we allow incompatible changes on Client-Server protocols? I would say
> # Do we allow incompatible changes on internal-server protocols (for e.g.
> NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support
> rolling-upgrades? I would like to not allow this, but I do not know how
> feasible this is. An option is to allow these changes between minor
> releases i.e. between hadoop-2.10 and hadoop-2.11.
> # Do we allow changes which force a HDFS metadata upgrade between a minor
> upgrade i.e. hadoop-2.20 to hadoop-2.21?
> # Clearly *no* incompatible changes (API/client-server/server-server)
> changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1
> have to be compatible among all respects.
> What else am I missing?
> I'll make sure we update our Roadmap wiki and other docs post this
> On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:
> > Thanks for bringing this up Arun. One of the issues is that we
> > haven't been clear about what type of compatibility breakages are
> > allowed, and which are not. For example, renaming FileSystem#open is
> > incompatible, and not OK, regardless of the alpha/beta tag. Breaking
> > a server/server APIs is OK pre-GA but probably not post GA, at least
> > in a point release, or required for a security fix, etc.
> > Configuration, data format, environment variable, changes etc can all
> > be similarly incompatible. The issue we had in HADOOP-9151 was someone
> > claimed it is not an incompatible change because it doesn't break API
> > compatibility even though it breaks wire compatibility. So let's be
> > clear about the types of incompatibility we are or are not permitting.
> > For example, will it be OK to merge a change before 2.2.0-beta that
> > requires an HDFS metadata upgrade? Or breaks client server wire
> > compatibility? I've been assuming that changing an API annotated
> > Public/Stable still requires multiple major releases (one to deprecate
> > and one to remove), does the alpha label change that? To some people
> > the "alpha", "beta" label implies instability in terms of
> > quality/features, while to others it means unstable APIs (and to some
> > both) so it would be good to spell that out. In short, agree that we