Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Non data-local scheduling


Copy link to this message
-
Re: Non data-local scheduling
Hi Andre,

Try setting yarn.scheduler.capacity.node-locality-delay to a number between
0 and 1.  This will turn on delay scheduling - here's the doc on how this
works:

For applications that request containers on particular nodes, the number of
scheduling opportunities since the last container assignment to wait before
accepting a placement on another node. Expressed as a float between 0 and
1, which, as a fraction of the cluster size, is the number of scheduling
opportunities to pass up. The default value of -1.0 means don't pass up any
scheduling opportunities.

-Sandy
On Thu, Oct 3, 2013 at 9:57 AM, André Hacker <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I have a 25 node cluster, running hadoop 2.1.0-beta, with capacity
> scheduler (default settings for scheduler) and replication factor 3.
>
> I have exclusive access to the cluster to run a benchmark job and I wonder
> why there are so few data-local and so many rack-local maps.
>
> The input format calculates 44 input splits and 44 map tasks, however, it
> seems to be random how many of them are processed data locally. Here the
> counters of my last tries:
>
> data-local / rack-local:
> Test 1: data-local:15 rack-local: 29
> Test 2: data-local:18 rack-local: 26
>
> I don't understand why there is not always 100% data local. This should
> not be a problem since the blocks of my input file are distributed over all
> nodes.
>
> Maybe someone can give me a hint.
>
> Thanks,
> André Hacker, TU Berlin
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB