Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [DISCUSS] Apache Hadoop 1.0?


Copy link to this message
-
Re: [DISCUSS] Apache Hadoop 1.0?
Steve Loughran 2011-11-17, 10:45
On 17/11/11 02:06, Scott Carey wrote:
>
>
> On 11/16/11 3:51 PM, "Nathan Roberts"<[EMAIL PROTECTED]>  wrote:
>
>> On 11/16/11 4:43 PM, "Arun C Murthy"<[EMAIL PROTECTED]>  wrote:
>>> I propose we adopt the convention that a new major version should be a
>>> superset of the previous major version, features-wise.
>> Just so I'm clear. This is only guaranteed at the time the new major
>> version is started. A day later a previous major line may merge a feature
>>from trunk and then it's no longer the case that 2.x.y is a superset. If
>> that's the case I'm not sure of the value of the convention. We could say
>> that new major versions always start from trunk, but that doesn't have
>> meaning outside of the developer community.
>
> I don't think in general one can say that major versions are a superset of
> previous major versions.  Then you would need to have a SuperMajor version
> number for the (rare) times that this was broken.
> In other words, the major version number really can't have any
> restrictions.
> Perhaps however, one can say that minor versions are supersets of prior
> minor version if one were to define 'superset'.
>
> Its going to be hard to claim that the 0.23 branch is a superset of 0.22
> -- After all, there is no JobTracker and all sorts of stuff has been
> removed or replaced with something else.  Whether that defines a superset
> or not gets into a lot of semantics of what we mean by 'superset'.

> Perhaps like 'feature' or 'bug fix', it is best not to get into the
> semantics of defining what we mean by 'superset' and rather define version
> number meaning only in terms of compatibility classifications.  Especially
> since the compatibility classification has implications for all of these
> other things  -- and IMO more clearly useful ones.  For example, consider
> that a "bug fix" may break wire compatibility, that a tiny harmless change
> can be considered a "new feature", or that replacing a single link in a UI
> could be considered breaking a "superset" rule.
>

I think it would be good to distinguish user-API supersets/subsets with
internal superset/subsets

-0.23 is a superset of the MR and HDFS APIs compatible with previous
versions (I don't know or care whether or not it is a proper superset or
not). The goal here is that end user apps and higher levels in the stack
(in-ASF and out-ASF) should work, though testing is required to verify
this.

A failure of the layers above to work with 0.23+ is something that
should be considered a regression, looked at and then either dismissed
as "you weren't meant to do that" or triggers a fix.

-0.23 has changed the back end means by which jobs are scheduled; the
monitoring APIs have changed, etc, etc. Where people will see a visible
difference is in the JT Web UI. That's not an API-level change

A failure of any code that goes into this bit of the system to compile
or run against 0.23 is something people can feel slightly sorry about,
but not enough to trigger reversions.

What I will miss in 0.23 is the MiniMRCluster, which I consider to be
part of the API. Certainly its why I pull in
hadoop-common-test-0.20.20x.jar into downstream builds, because it is
the simplest way to do basic tests in junit of MR operations. It's also
the most lightweight way to do single-machine Hadoop runs over small
datasets.