Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> MapReduce - FileInputFormat and Locality


+
Brian C. Huffman 2013-05-08, 14:21
Copy link to this message
-
Re: MapReduce - FileInputFormat and Locality
I think you misread it.

If a given split has only one block, it uses all the locations of that block.

If it so happens that a given split has multiple blocks, it uses all the locations of the first block.

HTH,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/
On May 8, 2013, at 7:21 AM, Brian C. Huffman wrote:

> All,
>
> I'm trying to understand how the current FileInputFormat implements locality.  As far as I can tell, it calculates splits using getSplit and each split will contain the node that hosts the first block of data in that split.  Is my understanding correct?
>
> Looking at the FileInputFormat for the old API (mapred), it appears that it does more to implement locality, using getSplitHosts to "return the hosts that contribute most for a given split"
>
> If I understand correctly, why was this changed?
>
> Thanks,
> Brian
>

+
Ted Dunning 2013-05-09, 02:10
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB