On 8/28/09 1:41 PM, "Dhruba Borthakur" <[EMAIL PROTECTED]> wrote:
> when a user issues the decommission command, all the blocks that are
> currently residing on it are inserted into the to-be-replicated queue. Then
> the ReplicationMonitor inside the namenode starts replicating these blocks
> (during this period, the replica on the machine being decommissioned is used
> for reads, but is not considered a valid replica by the ReplicationMonitor).
OK, so it sounds like their is no real ordering of the blocks in the
It follows then that even during normal operation, there is a scary edge
case risk when a block is down to one replica and a decommission is
triggered. While the name node is busy replicating blocks that can be
fetched from multiple sources, any file that suddenly finds itself to one
block may end up corrupted if that single replica somehow gets lost (node
I guess I'll file a JIRA to make replication smarter. There probably should
be queues based on # of replicas vs. expected # of replicas. This way
higher risk blocks are replicated first.