-Re: 0.20.205 Sustaining Release branch plan and content plan
Suresh Srinivas 2011-09-08, 23:47
Here is the patch that are not all committed to 205 yet. I am working with
Todd, Jitendra and Sanjay on this. We plan to get it done by tomorrow:
HDFS-1207. FSNamesystem.stallReplicationWork should be volatile. (Todd
Lipcon via dhruba)
Risk level: Low - Simple change of a variable to volatile, for multi
HDFS-2309. TestRenameWhileOpen fails. (jitendra)
Risk level: Low - Simple change to introduce first block report flag to fix
a test failure.
HADOOP-6722 Workaround a TCP spec quirk by not allowing NetUtils.connect
to connect to itself
Risk level: Low - check to see if a socket connected to it self.
HDFS-1252 TestDFSConcurrentFileOperations broken in 0.20-append TODO
Risk level: Low - fixing the test for correctness
HDFS-2300 TestFileAppend4 and TestMultiThreadedSync fail on 20.append
Risk level: Low - simple changes to fix the test failure.
HDFS-1779 After NameNode restart , Clients can not read partial files
even after client invokes Sync.
Risk level: Low - fixes related to bbw block reports. Disabled by append
supported config flag. This has been tested as part of CDH.
HDFS-1186 0.20: DNs should interrupt writers at start of recovery
Risk level: Low - ensures data integrity by preventing further writes on
lease recovery. This has been tested as part of CDH.
HDFS-1260 0.20: Block lost when multiple DNs trying to recover it to
Risk level: Low - code change looks straight forward. Tested as part of CDH.
HDFS-1122 Don't allow client verification to prematurely add
Risk level: Low - code change looks straight forward change. Handles client
verification interaction with DataBlockScanner and marking a block corrupt
incorrectly. Tested as part of CDH.
HDFS-1242 0.20 append: Add test for appendFile() race solved in HDFS-142
Risk level: Low - adds more tests to already commited change from HDFS-142.
HDFS-1218 20 append: Blocks recovered on startup should be treated
Risk level: Medium. This is a must fix to prevent dataloss if datanode goes
down in pipeline. This has been tested in CDH.
HDFS-1197 - Blocks are considered "complete" prematurely after
Risk level: Low. This fixes dataloss. This has been tested in CDH.
Considering a shorter version of the patch, given some of the issues were
handled by HDFS-1779, to reduce the risk.
*The patches I am not planning to add to 205 and the reason:*
HDFS-611 Heartbeats times from Datanodes increase when there are plenty of
blocks to delete
Could be HBase related.
HDFS-1056 Multi-node RPC deadlocks during block recovery
Setting up xceiver port using “dfs.datanode.port” to work around this issue.
HDFS-1982 Null pointer exception is thrown when NN restarts with a block
lesser in size than the block present in DN1 but generation stamps is
greater in NN.
Low probability of this occurring. No patch is available yet.
Null pointer exception comes when Namenode recovery happens and there is no
response from client to NN more than the hardlimit for NN recovery and the
current block is more than the prev block size in NN
Not in CDH. Suitable for a subsequent release.
HDFS-1264 0.20: OOME in HDFS client made an unrecoverable HDFS block
No patch available yet.
HDFS-1262 Failed pipeline creation during append leaves lease hanging on
Not relevant to get flush as append is no longer used by HBase.
HDFS-1266 Missing license headers in branch-20-append
Missing license headers - has already been fixed. TODO check
HDFS-1248 Misc cleanup/logging improvements for branch-20-append
Log related cleanup. Not critical for 205.
HDFS-1247 Improvements to HDFS-1204 test
Risk level: Low - Test improvements.