Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> MapReduce - FileInputFormat and Locality

Copy link to this message
Re: MapReduce - FileInputFormat and Locality
I think you misread it.

If a given split has only one block, it uses all the locations of that block.

If it so happens that a given split has multiple blocks, it uses all the locations of the first block.

+Vinod Kumar Vavilapalli
Hortonworks Inc.
On May 8, 2013, at 7:21 AM, Brian C. Huffman wrote:

> All,
> I'm trying to understand how the current FileInputFormat implements locality.  As far as I can tell, it calculates splits using getSplit and each split will contain the node that hosts the first block of data in that split.  Is my understanding correct?
> Looking at the FileInputFormat for the old API (mapred), it appears that it does more to implement locality, using getSplitHosts to "return the hosts that contribute most for a given split"
> If I understand correctly, why was this changed?
> Thanks,
> Brian