I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
been committed to TRUNK).
hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
its determined dead because it got a failed connection when it tried to
contact it, etc. This is useful in the interval between datanode dying and
namenode timing out its lease. Without this fix, the namenode can often
give out the dead datanode as a host for a block. If the cluster is small,
less than 5 or 6 nodes, then its very likely namenode will give out the dead
datanode as a block host.
Small clusters are common in hbase, especially when folks are starting out
or evaluating hbase. They'll start with three or four nodes carrying both
datanodes+hbase regionservers. They'll experiment killing one of the slaves
-- datanodes and regionserver -- and watch what happens. What follows is a
struggling dfsclient trying to create replicas where one of the datanodes
passed us by the namenode is dead. DFSClient will fail and then go back to
the namenode again, etc. (See
https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
blow-by-blow). HBase operation will be held up during this time and
eventually a regionserver will shut itself down to protect itself against
dataloss if we can't successfully write HDFS.