See my comments inline:
On Wed, Mar 14, 2012 at 9:24 AM, Giovanni Marzulli <
[EMAIL PROTECTED]> wrote:
> I'm trying HDFS on a small test cluster and I need to clarify some doubts
> about hadoop behaviour.
> Some details of my cluster:
> Hadoop version: 0.20.2
> I have two racks (rack1, rack2). Three datanodes for every rack.
> Replication factor is set to 3.
> "HDFS’s placement policy is to put one replica on one node in the local
> rack, another on a node in a different (remote) rack, and the last on a
> different node in the same remote rack."
> Instead, I noticed that sometimes, a few blocks of files are stored as
> follows: two replicas in the local rack and a replica in a different rack. Are
> there exceptions that cause different behaviour than default placement
Your description of replica placement is correct. However a node chosen
based on this placement may not be a good target, due to the traffic on the
node, remaining space etc. See BlockPlacementPolicyDefault#isGoodTarget().
Given the small cluster size, you may be seeing different behavior based on
load of individual nodes, racks etc.
Likewise, at times some blocks are read from nodes in the remote rack
> instead of nodes in the local rack. Why does it happen?
This is surprising. Not sure if the topology is correctly configired.
> Another thing: if I have two datacenters and two racks for each of them
> (so a hierarchical network topology), where two remote replicas arestored? Does Hadoop consider the hierarchy and stores one replica in the
> local datacenter and two replicas in the other datacenter? Or the two
> replicas are stored in a totally random rack?
> Hadoop clusters are not spread across datacenters.