I have a doubt about reduce tasks and block writes. Do a reduce task always
first write to hdfs in the node where they it is placed? (and then these
blocks would be replicated to other nodes)
In case yes, if I have a cluster of 5 nodes, 4 of them run DN and TT and one
(node A) just run DN, when running MR jobs, map tasks would never read from
node A? This would be because maps have data locality and if the reduce
tasks write first to the node where they live, one replica of the block
would always be in a node that has a TT. Node A would just contain blocks
created from replication by the framework as no reduce task would write
there directly. Is this correct?
Thanks in advance
View this message in context: http://lucene.472066.n3.nabble.com/doubt-about-reduce-tasks-and-block-writes-tp4003185.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.