Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Map Tasks do not obey data locality principle........


Copy link to this message
-
Re: Map Tasks do not obey data locality principle........
Sandy Ryza 2013-05-15, 20:49
Hi Nikhil,

Which scheduler are you using?

-Sandy
On Tue, May 14, 2013 at 3:55 AM, Agarwal, Nikhil
<[EMAIL PROTECTED]>wrote:

>  Hi,****
>
> ** **
>
> I  have a 3-node cluster, with JobTracker running on one machine and
> TaskTrackers on other two (say, slave1 and slave2). Instead of using HDFS,
> I have written my own FileSystem implementation. Since, unlike HDFS I am
> unable to provide a shared filesystem view to JobTrackers and TaskTracker
> thus, I mounted the root container of slave2 on a directory in slave1 (nfs
> mount). By doing this I am able to submit MR job to JobTracker, with input
> path as my_scheme://slave1_IP:Port/dir1, etc.  MR runs successfully but
> what happens is that data locality is not ensured i.e. if files A,B,C are
> kept on slave1 and D,E,F on slave2 then according to data locality, map
> tasks should be submitted such that map task of A,B,C are submitted to
> TaskTracker running on slave1 and D,E,F on slave2. Instead of this, it
> randomly schedules the map task to any of the tasktrackers. If map task of
> file A is submitted to TaskTracker running on slave2 then it implies that
> file A is being fetched over the network by slave2.****
>
> ** **
>
> How do I avoid this from happening?****
>
> ** **
>
> Thanks,****
>
> Nikhil****
>
> ** **
>
> ** **
>