-[DISCUSSION] Thinking about 20.204 and beyond
Eric Baldeschwieler 2011-06-18, 06:50
Along with starting a new release off the mainline (see previous mail), the Yahoo! team plans to continue producing sustaining releases off the Hadoop with security branch, such as 0.20.203 . I'm writing this email to outline our plans, explain Yahoo's motivation for supporting this work and request feedback and hopefully your endorsement. This initiative stems from Yahoo's commitment to do its hadoop work in Apache and discontinue the Yahoo Distribution of Hadoop (http://yhoo.it/i9Ww8W).
We hope to produce a new 0.20.204 release in Apache in the next few weeks. Owen O'Malley is planning to act as release master for this release. This will be based on work in the hadoop-with-security branch, just as 0.20.203 but will include bugfixes and enhancements beyond those in 0.20.203. This is one in a series of releases we hope to do in the next 6-9 months as hadoop 0.23 (or whatever the community chooses to call it) goes through the various stages of stability testing and burn-in.
CONTENTS OF THE RELEASE:
- RPM & .deb packaging to ease deployment (back ported from trunk)
- I am excited to see hadoop released with .deb & RPM packaging from Apache for the first time.
- This will greatly ease deployment
- Disk fail in place (merged with trunk, except for some MR changes conflict with MR-279, these will be reimplemented in MR-279)
- This change has been motivated by operational problems we observed with our new 12 disk machines.
- This work should greatly improve Hadoop availability by keeping nodes working when one of their disks fails
- Lots of of additional fixes (I've included the change log below)
WHY THIS PROCESS:
Producing a stable release of Hadoop is a long, hard and expensive process. Historically Y! has produced all such releases. Other releases of Hadoop have either not been stable (Hadoop 0.19 and Hadoop 0.21) or have been based on a stable Apache release driven by Yahoo (CDH and Facebook). Once we've paid the price of making a stable release, it makes a lot of sense to accept safe improvements as well as bug fixes. Doing so allows one to get customer impacting improvements into production in days, rather than years, which is what would happen if one waited for changes to come in the next stable release off the Hadoop mainline. Given that it takes many months to stabilize trunk, there is no way to get new easy fixes into users hands quickly via a new mainline release.
For the last few years Yahoo has done sustaining engineering in open source via Github. These patches have been contributed to Apache Hadoop mainline and backported to the sustaining branch on github (for yahoo 0.20 for example). We've then cut Yahoo releases from Github. Cloudera and Facebook have also taken these patches from Github and incorporated these improvements into their releases, so the community has benefitted from this process for years. What we are planning to do now is simply move this process into Apache, so that Apache releases themselves are timely and relevant, not always a year or two behind what users need.
How do I propose making these decisions? Deciding what is a safe patch is a judgement call. Apache process suggests that the release manager makes these calls (http://bit.ly/mJcBjc). For releases Y! champions, such as 0.20 (arun & owen), we are ready to do the sustaining engineering, make these calls and stand our reputation behind the quality of the result. Other release masters are championing other Apache Hadoop releases currently (Nigel and Tom) and I think they should be free to do the same. For hadoop 20.204, I propose pushing what is currently in the hadoop-with-security branch. Part of the reason for this thread is to socialize this process, so that community members can champion stable patches for inclusion in 20.205. In the future I propose that a branch's release master request suggestions for future releases on this list, but is free to use their judgement on what is accepted (pretty much what nigel is doing on 0.22 today).
The vote on 0.20.203 was acrimonious, but I believe that 0.20.203 was a useful step forward for Apache Hadoop. 0.20.204 will again be the best stable release of Apache Hadoop ever. I hope folks can support the effort. With your contribution 0.20.205 can be even better, fixing issues that plague your Hadoop clusters. This email is part of a wider effort from the Yahoo team to co-plan our work with the community.
eric14 a.k.a. Eric Baldeschwieler
VP Hadoop Software Development @Yahoo!
Release 0.20.204.0 - unreleased
HADOOP-6255. Create RPM and Debian packages for common. Changes deployment
layout to be consistent across the binary tgz, rpm, and deb. Adds setup
scripts for easy one node cluster configuration and user creation.
(Eric Yang via omalley)
MAPREDUCE-2495. exit() the TaskTracker when the distributed cache cleanup
thread dies. (Robert Joseph Evans via cdouglas)
HDFS-1878. TestHDFSServerPorts unit test failure - race condition
in FSNamesystem.close() causes NullPointerException without serious
MAPREDUCE-2452. Moves the cancellation of delegation tokens to a separate
MAPREDUCE-2555. Avoid sprious logging from completedtasks. (Thomas Graves
MAPREDUCE-2451. Log the details from health check script at the
JobTracker. (Thomas Graves via cdouglas)
MAPREDUCE-2535. Fix NPE in JobClient caused by retirement. (Robert Joseph
Evans via cdouglas)
MAPREDUCE-2456. Log the reduce taskID and associated TaskTrackers with
failed fetch notifications in the JobTracker log.
(Jeffrey Naisbitt via cdouglas)
HDFS-2044. TestQueueProcessingStatistics failing automatic test due to
timing issues. (mattf)
HADOOP-7248. Update eclipse target to generate .classpath from ivy config.