-Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"
Excellent news! Will you also make Howl, Oozie, and Yarn Apache projects as
On Mon, Jan 31, 2011 at 7:27 PM, Eric Baldeschwieler
> Hi Folks,
> I'm pleased to announce that after some reflection, Yahoo! has decided to
> discontinue the "The Yahoo Distribution of Hadoop" and focus on Apache
> Hadoop. We plan to remove all references to a Yahoo distribution from our
> website (developer.yahoo.com/hadoop), close our github repo (
> yahoo.github.com/hadoop-common) and focus on working more closely with the
> Apache community. Our intent is to return to helping Apache produce binary
> releases of Apache Hadoop that are so bullet proof that Yahoo and other
> production Hadoop users can run them unpatched on their clusters.
> Until Hadoop 0.20, Yahoo committers worked as release masters to produce
> binary Apache Hadoop releases that the entire community used on their
> clusters. As the community grew, we have experiment with using the
> "Yahoo! Distribution of Hadoop" as the vehicle to share our work.
> Unfortunately, Apache is no longer the obvious place to go for Hadoop
> releases. The Yahoo! team wants to return to a world where anyone can
> download and directly use releases of Hadoop from Apache. We want to
> contribute to the stabilization and testing of those releases. We also want
> to share our regular program of sustaining engineering that backports minor
> feature enhancements into new dot releases on a regular basis, so that the
> world sees regular improvements coming from Apache every few months, not
> Recently the Apache Hadoop community has been very turbulent. Over the
> last few months we have been developing Hadoop enhancements in our internal
> git repository while doing a complete review of our options. Our commitment
> to open sourcing our work was never in doubt (see http://yhoo.it/e8p3Dd),
> but the future of the "Yahoo distribution of Hadoop" was far from clear.
> We've concluded that focusing on Apache Hadoop is the way forward. We
> believe that more focus on communicating our goals to the Apache Hadoop
> community, and more willingness to compromise on how we get to those goals,
> will help us get back to making Hadoop even better.
> Unfortunately, we now have to sort out how to contribute several
> person-years worth of work to Apache to let us unwind the Yahoo! git
> repositories. We currently run two lines of Hadoop development, our
> sustaining program (hadoop-0.20-sustaining) and hadoop-future.
> Hadoop-0.20-sustaining is the stable version of Hadoop we currently run on
> Yahoo's 40,000 nodes. It contains a series of fixes and enhancements that
> are all backwards compatible with our "Hadoop 0.20 with security". It is
> our most stable and high performance release of Hadoop ever. We've expended
> a lot of energy finding and fixing bugs in it this year. We have initiated
> the process of contributing this work to Apache in the branch:
> hadoop/common/branches/branch-0.20-security. We've proposed calling this
> the 20.100 release. Once folks have had a chance to try this out and we've
> had a chance to respond to their feedback, we plan to create 20.100 release
> candidates and ask the community to vote on making them Apache releases.
> Hadoop-future is our new feature branch. We are working on a set of new
> features for Hadoop to improve its availability, scalability and
> interoperability to make Hadoop more usable in mission critical deployments.
> You're going to see another burst of email activity from us as we work to
> get hadoop-future patches socialized, reviewed and checked in. These bulk
> checkins are exceptional. They are the result of us striving to be more
> transparent. Once we've merged our hadoop-future and hadoop-0.20-sustaining
> work back into Apache, folks can expect us to return to our regular
> development cadence. Looking forward, we plan to socialize our roadmaps
> regularly, actively synchronize our work with other active Hadoop