Brian C. Huffman 2013-05-08, 14:21
I think you misread it.
If a given split has only one block, it uses all the locations of that block.
If it so happens that a given split has multiple blocks, it uses all the locations of the first block.
+Vinod Kumar Vavilapalli
On May 8, 2013, at 7:21 AM, Brian C. Huffman wrote:
> I'm trying to understand how the current FileInputFormat implements locality. As far as I can tell, it calculates splits using getSplit and each split will contain the node that hosts the first block of data in that split. Is my understanding correct?
> Looking at the FileInputFormat for the old API (mapred), it appears that it does more to implement locality, using getSplitHosts to "return the hosts that contribute most for a given split"
> If I understand correctly, why was this changed?
Ted Dunning 2013-05-09, 02:10