Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Map Tasks do not obey data locality principle........


+
Agarwal, Nikhil 2013-05-14, 10:55
Copy link to this message
-
Re: Map Tasks do not obey data locality principle........
Hi Nikhil,

Which scheduler are you using?

-Sandy
On Tue, May 14, 2013 at 3:55 AM, Agarwal, Nikhil
<[EMAIL PROTECTED]>wrote:

>  Hi,****
>
> ** **
>
> I  have a 3-node cluster, with JobTracker running on one machine and
> TaskTrackers on other two (say, slave1 and slave2). Instead of using HDFS,
> I have written my own FileSystem implementation. Since, unlike HDFS I am
> unable to provide a shared filesystem view to JobTrackers and TaskTracker
> thus, I mounted the root container of slave2 on a directory in slave1 (nfs
> mount). By doing this I am able to submit MR job to JobTracker, with input
> path as my_scheme://slave1_IP:Port/dir1, etc.  MR runs successfully but
> what happens is that data locality is not ensured i.e. if files A,B,C are
> kept on slave1 and D,E,F on slave2 then according to data locality, map
> tasks should be submitted such that map task of A,B,C are submitted to
> TaskTracker running on slave1 and D,E,F on slave2. Instead of this, it
> randomly schedules the map task to any of the tasktrackers. If map task of
> file A is submitted to TaskTracker running on slave2 then it implies that
> file A is being fetched over the network by slave2.****
>
> ** **
>
> How do I avoid this from happening?****
>
> ** **
>
> Thanks,****
>
> Nikhil****
>
> ** **
>
> ** **
>
+
Agarwal, Nikhil 2013-05-16, 06:08
+
Harsh J 2013-05-16, 06:17
+
Agarwal, Nikhil 2013-05-16, 06:21
+
Harsh J 2013-05-16, 06:52
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB