That is my understanding as the default strategy is to avoid a network
transfer and place the first replica on the same server that executed the
hdfs client code (i.e. in your case the map or reduce task). If writing to
the 'local' node is not possible, then I believe a random node will be
If you want to learn more about this, I suggest to look at the policies:
Also, there is now a way to create your own policy via
dfs.block.replicator.classname. I'm not familiar with this, but you can
read about it in https://issues.apache.org/jira/browse/HDFS-385
From: Lukas Kairies [mailto:[EMAIL PROTECTED]]
Sent: Friday, July 26, 2013 6:45 AM
To: [EMAIL PROTECTED]
Subject: HDFS block placement
I am a bit confused about the block placement in Hadoop. Assume that there
is no replication and a task (map or reduce) writes a file to HDFS, will be
all blocks stored on the same local node (the node on which the task runs)?
I think yes but I am node sure.