Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> MapReduce - FileInputFormat and Locality


+
Brian C. Huffman 2013-05-08, 14:21
Copy link to this message
-
Re: MapReduce - FileInputFormat and Locality
I think you misread it.

If a given split has only one block, it uses all the locations of that block.

If it so happens that a given split has multiple blocks, it uses all the locations of the first block.

HTH,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/
On May 8, 2013, at 7:21 AM, Brian C. Huffman wrote:

> All,
>
> I'm trying to understand how the current FileInputFormat implements locality.  As far as I can tell, it calculates splits using getSplit and each split will contain the node that hosts the first block of data in that split.  Is my understanding correct?
>
> Looking at the FileInputFormat for the old API (mapred), it appears that it does more to implement locality, using getSplitHosts to "return the hosts that contribute most for a given split"
>
> If I understand correctly, why was this changed?
>
> Thanks,
> Brian
>

+
Ted Dunning 2013-05-09, 02:10