Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> MapReduce - FileInputFormat and Locality

Copy link to this message
MapReduce - FileInputFormat and Locality

I'm trying to understand how the current FileInputFormat implements
locality.  As far as I can tell, it calculates splits using getSplit and
each split will contain the node that hosts the first block of data in
that split.  Is my understanding correct?

Looking at the FileInputFormat for the old API (mapred), it appears that
it does more to implement locality, using getSplitHosts to "return the
hosts that contribute most for a given split"

If I understand correctly, why was this changed?

Vinod Kumar Vavilapalli 2013-05-08, 22:00
Ted Dunning 2013-05-09, 02:10