|
|
-
0.20.205 Sustaining Release branch plan and content plan
Matt Foley 2011-09-07, 10:23
Hi all, Over the past week a number of people have provided input for patches they would like to see in 205, with reasons and risk evaluations; please see the threads "Content request for 0.20.205 Sustaining Release" and "Add Append-HBase support in upcoming 20.205". Thanks to all who took the effort to share this information with the list.
The various patches are grouped below, in numeric order for ease of review. My proposed plan for branching the branch-0.20.205 is at the end of this message.
Comparing the requests with the patches currently in branch-0.20-security, we have the following:
1. THESE PATCHES ARE ALREADY IN 20-security AND ARE REQUESTED FOR INCLUSION IN 205:
HADOOP-6833. IPC leaks call parameters when exceptions thrown. (Todd Lipcon via eli) HADOOP-6889. Make RPC to have an option to timeout - backport to 0.20-security. Unit tests updated to 17/Aug/2011 version. (John George and Ravi Prakash via mattf) HADOOP-7314. Add support for throwing UnknownHostException when a host doesn't resolve. Needed for MAPREDUCE-2489. (Jeffrey Naisbitt via mattf) HADOOP-7432. Back-port HADOOP-7110 to 0.20-security: Implement chmod in NativeIO library. (Sherry Chen via mattf) HADOOP-7472. RPC client should deal with IP address change. (Kihwal Lee via suresh) HADOOP-7539. merge hadoop archive goodness from trunk to .20 (John George via mahadev) HDFS-0142. Blocks that are being written by a client are stored in the blocksBeingWritten directory. (Dhruba Borthakur, Nicolas Spiegelberg, Todd Lipcon via dhruba) HDFS-0200. Support append and sync for hadoop 0.20 branch. (dhruba) HDFS-0561. Fix write pipeline READ_TIMEOUT. (Todd Lipcon via dhruba) HDFS-0606. Fix ConcurrentModificationException in invalidateCorruptReplicas. (Todd Lipcon via dhruba) HDFS-0630. Client can exclude specific nodes in the write pipeline. (Nicolas Spiegelberg via dhruba) HDFS-0724. Use a bidirectional heartbeat to detect stuck pipeline. (hairong) HDFS-0826. Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline. (dhruba) HDFS-0895. Allow hflush/sync to occur in parallel with new writes to the file. (Todd Lipcon via hairong) HDFS-0988. Fix bug where savenameSpace can corrupt edits log. (Nicolas Spiegelberg via dhruba) HDFS-1054. remove sleep before retry for allocating a block. (Todd Lipcon via dhruba) HDFS-1057. Concurrent readers hit ChecksumExceptions if following a writer to very end of file (Sam Rash via dhruba) HDFS-1118. Fix socketleak on DFSClient. (Zheng Shao via dhruba) HDFS-1141. completeFile does not check lease ownership. (Todd Lipcon via dhruba) HDFS-1164. TestHdfsProxy is failing. (Todd Lipcon) HDFS-1202. DataBlockScanner throws NPE when updated before initialized. (Todd Lipcon) HDFS-1204. Lease expiration should recover single files, not entire lease holder (Sam Rash via dhruba) HDFS-1210. DFSClient should log exception when block recovery fails. (Todd Lipcon via dhruba) HDFS-1211. Block receiver should not log "rewind" packets at INFO level. (Todd Lipcon) HDFS-1346. DFSClient receives out of order packet ack. (hairong) HDFS-1520. Lightweight NameNode operation recoverLease to trigger lease recovery. (Hairong Kuang via dhruba) HDFS-1554. New semantics for recoverLease. (hairong) HDFS-1555. Disallow pipelien recovery if a file is already being lease recovered. (hairong) HDFS-1836. Thousand of CLOSE_WAIT socket. Contributed by Todd Lipcon, ported to security branch by Bharath Mundlapudi. (via mattf) HDFS-2053. Bug in INodeDirectory#computeContentSummary warning (Michael Noll via eli) HDFS-2117. DiskChecker#mkdirsWithExistsAndPermissionCheck may return true even when the dir is not created. (eli) HDFS-2190. NN fails to start if it encounters an empty or malformed fstime file. (atm) HDFS-2202. Add a new DFSAdmin command to set balancer bandwidth of datanodes without restarting. (Eric Payne via szetszwo) MAPREDUCE-2187. Reporter sends progress during sort/merge. (Anupam Seth via acmurthy) MAPREDUCE-2324. Removed usage of broken ResourceEstimator.getEstimatedReduceInputSize to check against usable disk-space on TaskTracker. (Robert Evans via acmurthy) MAPREDUCE-2489. Jobsplits with random hostnames can make the queue unusable. (Jeffrey Naisbitt via mahadev) MAPREDUCE-2494. Make the distributed cache delete entires using LRU priority (Robert Joseph Evans via mahadev) MAPREDUCE-2650. back-port MAPREDUCE-2238 to 0.20-security. (Sherry Chen via mahadev) MAPREDUCE-2705. Implements launch of multiple tasks concurrently. (Thomas Graves via ddas) MAPREDUCE-2729. Ensure jobs with reduces which can't be launched due to slow-start do not count for user-limits. (Sherry Chen via acmurthy) MAPREDUCE-2780. Use a utility method to set service in token. (Daryn Sharp via jitendra) MAPREDUCE-2852. Jira for YDH bug 2854624. (Kihwal Lee via eli)
2. THESE PATCHES ARE ALREADY IN 20-security BUT NO ONE HAS YET SPOKEN FOR INCLUDING THEM IN 205:
HADOOP-7400. Fix HdfsProxyTests fails when the -Dtest.build.dir and -Dbuild.test is set a dir other than build dir (gkesavan). HADOOP-7594. Support HTTP REST in HttpServer. (szetszwo) HADOOP-7596. Makes packaging of 64-bit jsvc possible. Has other bug fixes to do with packaging. (Eric Yang via ddas) HDFS-1207. FSNamesystem.stallReplicationWork should be volatile. (Todd Lipcon via dhruba) HDFS-2259. DN web-UI doesn't work with paths that contain html. (eli) HDFS-2309. TestRenameWhileOpen fails. (jitendra) MAPREDUCE-7343. Make the number of warnings accepted by test-patch configurable to limit false positives. (Thomas Graves via cdouglas)
3. THESE PATCHES ARE REQUESTED FOR INCLUSION IN 205, BUT ARE NOT YET IN 20-security:
Additional append issues (proponents Todd and S
+
Matt Foley 2011-09-07, 10:23
-
Re: 0.20.205 Sustaining Release branch plan and content plan
Suresh Srinivas 2011-09-08, 23:47
Here is the patch that are not all committed to 205 yet. I am working with Todd, Jitendra and Sanjay on this. We plan to get it done by tomorrow: HDFS-1207. FSNamesystem.stallReplicationWork should be volatile. (Todd Lipcon via dhruba) Risk level: Low - Simple change of a variable to volatile, for multi threaded correctness
HDFS-2309. TestRenameWhileOpen fails. (jitendra) Risk level: Low - Simple change to introduce first block report flag to fix a test failure.
HADOOP-6722 Workaround a TCP spec quirk by not allowing NetUtils.connect to connect to itself Risk level: Low - check to see if a socket connected to it self.
HDFS-1252 TestDFSConcurrentFileOperations broken in 0.20-append TODO suresh Risk level: Low - fixing the test for correctness
HDFS-2300 TestFileAppend4 and TestMultiThreadedSync fail on 20.append Risk level: Low - simple changes to fix the test failure.
HDFS-1779 After NameNode restart , Clients can not read partial files even after client invokes Sync. Risk level: Low - fixes related to bbw block reports. Disabled by append supported config flag. This has been tested as part of CDH.
HDFS-1186 0.20: DNs should interrupt writers at start of recovery Risk level: Low - ensures data integrity by preventing further writes on lease recovery. This has been tested as part of CDH.
HDFS-1260 0.20: Block lost when multiple DNs trying to recover it to different genstamps Risk level: Low - code change looks straight forward. Tested as part of CDH.
HDFS-1122 Don't allow client verification to prematurely add Risk level: Low - code change looks straight forward change. Handles client verification interaction with DataBlockScanner and marking a block corrupt incorrectly. Tested as part of CDH.
HDFS-1242 0.20 append: Add test for appendFile() race solved in HDFS-142 Risk level: Low - adds more tests to already commited change from HDFS-142.
HDFS-1218 20 append: Blocks recovered on startup should be treated Risk level: Medium. This is a must fix to prevent dataloss if datanode goes down in pipeline. This has been tested in CDH.
HDFS-1197 - Blocks are considered "complete" prematurely after Risk level: Low. This fixes dataloss. This has been tested in CDH. Considering a shorter version of the patch, given some of the issues were handled by HDFS-1779, to reduce the risk. *The patches I am not planning to add to 205 and the reason:* HDFS-611 Heartbeats times from Datanodes increase when there are plenty of blocks to delete Could be HBase related.
HDFS-1056 Multi-node RPC deadlocks during block recovery Setting up xceiver port using “dfs.datanode.port” to work around this issue.
HDFS-1982 Null pointer exception is thrown when NN restarts with a block lesser in size than the block present in DN1 but generation stamps is greater in NN. Low probability of this occurring. No patch is available yet.
HDFS-1951/HDFS-1970 Null pointer exception comes when Namenode recovery happens and there is no response from client to NN more than the hardlimit for NN recovery and the current block is more than the prev block size in NN Not in CDH. Suitable for a subsequent release.
HDFS-1264 0.20: OOME in HDFS client made an unrecoverable HDFS block No patch available yet.
HDFS-1262 Failed pipeline creation during append leaves lease hanging on NN Not relevant to get flush as append is no longer used by HBase.
HDFS-1266 Missing license headers in branch-20-append Missing license headers - has already been fixed. TODO check
HDFS-1248 Misc cleanup/logging improvements for branch-20-append Log related cleanup. Not critical for 205.
HDFS-1247 Improvements to HDFS-1204 test Risk level: Low - Test improvements.
+
Suresh Srinivas 2011-09-08, 23:47
-
Re: 0.20.205 Sustaining Release branch plan and content plan
Steve Loughran 2011-09-09, 10:45
On 09/09/11 00:47, Suresh Srinivas wrote: > Here is the patch that are not all committed to 205 yet. I am working with > Todd, Jitendra and Sanjay on this. We plan to get it done by tomorrow: > HDFS-1207. FSNamesystem.stallReplicationWork should be volatile. (Todd > Lipcon via dhruba) > Risk level: Low - Simple change of a variable to volatile, for multi > threaded correctness What about RHEL6.1 workarounds? https://issues.apache.org/jira/browse/HADOOP-7156
+
Steve Loughran 2011-09-09, 10:45
+
Matt Foley 2011-09-09, 23:56
-
Re: 0.20.205 Sustaining Release branch plan and content plan
Todd Lipcon 2011-09-10, 00:09
On Fri, Sep 9, 2011 at 4:56 PM, Matt Foley <[EMAIL PROTECTED]> wrote: > If I read the jira correctly, this is a workaround for RHEL6.0 that is no > longer needed for RHEL6.1. > Is that correct? If so, would it be no longer needed? Yes, it's fixed in RHEL 6.1. Also, since the uid caching is enabled in the 20x series, it's less important, since the race is much much rare. So I'm +/- 0 (doesn't seem urgent but shoudln't hurt things) -Todd > > On Fri, Sep 9, 2011 at 3:45 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: > >> >> What about RHEL6.1 workarounds? >> https://issues.apache.org/**jira/browse/HADOOP-7156<https://issues.apache.org/jira/browse/HADOOP-7156>>> >> > -- Todd Lipcon Software Engineer, Cloudera
+
Todd Lipcon 2011-09-10, 00:09
-
Re: 0.20.205 Sustaining Release branch plan and content plan
Matt Foley 2011-09-10, 00:11
[REMINDER: branch-0.20.205 will be created this weekend.]
Regarding "group 2" of the planning message, there were seven orphan patches, already in 0.20-security, but not yet spoken for in 205:
HADOOP-7400. Fix HdfsProxyTests fails when the -Dtest.build.dir > and -Dbuild.test is set a dir other than build dir (gkesavan). > HADOOP-7594. Support HTTP REST in HttpServer. (szetszwo) > HADOOP-7596. Makes packaging of 64-bit jsvc possible. Has other > bug fixes to do with packaging. (Eric Yang via ddas) > HDFS-1207. FSNamesystem.stallReplicationWork should be volatile. > (Todd Lipcon via dhruba) > HDFS-2259. DN web-UI doesn't work with paths that contain html. (eli) > HDFS-2309. TestRenameWhileOpen fails. (jitendra) > MAPREDUCE-7343. Make the number of warnings accepted by test-patch > configurable to limit false positives. (Thomas Graves via cdouglas) >
Here is their disposition:
HADOOP-7400 and HADOOP-7596 are build/package infrastructure issues. I need them for the release, so they will be included.
HDFS-1207 is needed for append, is requested by Suresh, and will be included.
HDFS-2259 is recommended by Eli, and will be included.
HDFS-2309 fixes a bug detected by a failing unit test in the 0.20 build. It will be included.
HADOOP-7594 is requested by Sanjay, and will be included.
MAPREDUCE-7343 doesn't exist! It is really a reference to HADOOP-7343. It is requested by Nathan, and will be included. (I fixed the CHANGES.txt reference and caused the commit to show - as much as possible - in the jira.)
So in summary, all the orphans have been claimed and championed. --Matt
+
Matt Foley 2011-09-10, 00:11
-
Re: 0.20.205 Sustaining Release branch plan and content plan
Eric Baldeschwieler 2011-09-30, 05:34
Let me give a shout out to all the folks that are helping to make this work. 20.205 has gotten a lot of things done in a very short period of time and its exciting to see such a diverse group of folks pushing together to drive this forward! I'm looking forward to seeing security and HBase run together! And the new HDFS HTTP APIs are going to open up a lot of interesting possibilities!
Thanks all!
E14
On Sep 9, 2011, at 5:11 PM, Matt Foley wrote:
> [REMINDER: branch-0.20.205 will be created this weekend.] > > Regarding "group 2" of the planning message, there were seven orphan > patches, already in 0.20-security, but not yet spoken for in 205: > > HADOOP-7400. Fix HdfsProxyTests fails when the -Dtest.build.dir >> and -Dbuild.test is set a dir other than build dir (gkesavan). >> HADOOP-7594. Support HTTP REST in HttpServer. (szetszwo) >> HADOOP-7596. Makes packaging of 64-bit jsvc possible. Has other >> bug fixes to do with packaging. (Eric Yang via ddas) >> HDFS-1207. FSNamesystem.stallReplicationWork should be volatile. >> (Todd Lipcon via dhruba) >> HDFS-2259. DN web-UI doesn't work with paths that contain html. (eli) >> HDFS-2309. TestRenameWhileOpen fails. (jitendra) >> MAPREDUCE-7343. Make the number of warnings accepted by test-patch >> configurable to limit false positives. (Thomas Graves via cdouglas) >> > > Here is their disposition: > > HADOOP-7400 and HADOOP-7596 are build/package infrastructure issues. I need > them for the release, so they will be included. > > HDFS-1207 is needed for append, is requested by Suresh, and will be > included. > > HDFS-2259 is recommended by Eli, and will be included. > > HDFS-2309 fixes a bug detected by a failing unit test in the 0.20 build. It > will be included. > > HADOOP-7594 is requested by Sanjay, and will be included. > > MAPREDUCE-7343 doesn't exist! It is really a reference to HADOOP-7343. It > is requested by Nathan, and will be included. (I fixed the CHANGES.txt > reference and caused the commit to show - as much as possible - in the > jira.) > > So in summary, all the orphans have been claimed and championed. > --Matt
+
Eric Baldeschwieler 2011-09-30, 05:34
-
Re: 0.20.205 Sustaining Release branch plan and content plan
Matt Foley 2011-09-13, 01:14
Hi Roman, Normally I would say no, but these seem to be build issues. Let me consult with Giri. Thanks, --Matt On Mon, Sep 12, 2011 at 4:37 PM, Roman Shaposhnik <[EMAIL PROTECTED]> wrote: > Matt, > > sorry for chiming in late, but is there a chance for these fixes to be > committed to 0.20.205 branch? > https://issues.apache.org/jira/browse/HADOOP-6436> https://issues.apache.org/jira/browse/MAPREDUCE-2127> https://issues.apache.org/jira/browse/HDFS-2327 (or HDFS-2325) > > Thanks, > Roman. >
+
Matt Foley 2011-09-13, 01:14
-
Re: 0.20.205 Sustaining Release branch plan and content plan
Matt Foley 2011-09-14, 23:23
Hi Roman,
HADOOP-6436 is a large patch, which needs to be back-ported. MAPREDUCE-2127 is small, but needs to be back-ported. HDFS-2325 has no patch, although one is sketched in the jira. However, it is marked as a 205 blocker.
Can you help us by creating and trying any of these patches for branch-0.20-security? HDFS-2325 and MAPREDUCE-2127 seem to be the highest priority. If you can help provide those, we'll get them into rc1, assuming there is one, which seems likely since there are a couple unit tests failing too.
Thanks, --Matt
On Wed, Sep 14, 2011 at 2:02 PM, Roman Shaposhnik <[EMAIL PROTECTED]>wrote:
> On Mon, Sep 12, 2011 at 6:14 PM, Matt Foley <[EMAIL PROTECTED]> > wrote: > > Hi Roman, > > Normally I would say no, but these seem to be build issues. Let me > consult > > with Giri. > > Great! What's the verdict? > > Thanks, > Roman. >
+
Matt Foley 2011-09-14, 23:23
|
|