Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Update on hadoop-0.23

Copy link to this message
Re: Update on hadoop-0.23
On 30/09/2011 06:27, Eric Baldeschwieler wrote:
> Hi Doug, Jeff, Roman
> Let me rephrase my point.  I'd like to request that folks take bigtop project discussions onto the bigtop lists and don't greet status reports on general@hadoop with insinuations that folks who are working really hard on this project should be contributing different things to another project or are somehow misbehaving by testing on their own infrastructure with their own users.  Any kind of testing is a gift to the community and adds value.  You are all welcome to contribute too.  If you find issues, then file JIRAs and work on the appropriate project lists.  I believe that observing these points of etiquette will help this project continue to prosper.

Bigtop is an attempt to have a coherent test & release process, with
full stack testing, release artifacts tested on a set of platforms, and
a codebase that has matured out of cloudera. I don't care about origin,
all I want is consistent releases of compatible artifacts -and the
testing to back up the claims of compatibility. The artifacts should be
those things people install -RPMs, debs- ideally the tests should start
of small clusters, then scale up to production size before release.

there are things happening in the hadoop core that mimic some of the
features here -RPMs- but appear to be lacking the full stack functional
testing which is a goal of bigtop.

> I agree with you that the Hadoop project is healthy.

How do you define health in this context?

1. There is a 0.20.20x branch that is the one people use in production
-the stable one. The API is behind the 0.21+ feature set, and so is less
convenient to code against. It picks up features as well as fixes, which
I find troublesome. You don't see new features going into RHEL5.x,
Ubuntu LTS releases. Yes, I know users like those features, but it could
be due to a slow release of new versions that they trust to work and
preserve data. It's healthy, but the backport of features creates inertia.

2. there is the 0.23 branch that everyone -especially Arun- is working
on, which is really promising, though some of the features (federation,
YARN) are going to be fairly traumatic in rollout. That doesn't mean
they are good, only that switching to them will have surprises.

3. There's 0.22 which is going to combine the API of 0.21 with the fixes
of 0.20.20x *and* will be the last release of the MR1.0 engine. For that
last reason, I think there's value in pushing it out, though it's going
to take time, and there's a risk of it adding another branch to be
maintained for an indeterminate period.

4. There are the third party "compatible" projects, CDH, MapR, EMC HD,
Amazon Elastic MR, which are all declaring compatibility with 0.20.x; no
stated plans when/how to move to 0.23+

I would say Hadoop is incredibly successful -it's generating lots of
interest, is being used by big companies, it has almost singlehandedly
revitalised server-side Java dev, it is the foundation for an OSS
version of the MS Azure stack. But for that latter goal to be achieved
-it's what I want- we need to move forward on releases where the entire
stack is consistent, releases that people want to use.

For that consistency, I'd like bigtop to be a subject people can talk
about here, just as MRUnit, which will be needed now that 0.23+ removes
the MiniMRCluster feature.