Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - MapReduce - FileInputFormat and Locality


Copy link to this message
-
MapReduce - FileInputFormat and Locality
Brian C. Huffman 2013-05-08, 14:21
All,

I'm trying to understand how the current FileInputFormat implements
locality.  As far as I can tell, it calculates splits using getSplit and
each split will contain the node that hosts the first block of data in
that split.  Is my understanding correct?

Looking at the FileInputFormat for the old API (mapred), it appears that
it does more to implement locality, using getSplitHosts to "return the
hosts that contribute most for a given split"

If I understand correctly, why was this changed?

Thanks,
Brian