-RE: Non data-local scheduling
John Lilley 2013-10-05, 22:19
Is this option set on a per-application-instance basis or is it a cluster-wide setting (or both)?
Is this a MapReduce-specific issue, or a YARN issue?
I don't understand how the problem arises in the first place. For example, if I have an idle cluster with 10 nodes and each node has four containers available at the requested capacity, and my YARN application requests 20 containers with node affinity such that I'm desiring two containers per node , why wouldn't YARN give me exactly the task mapping that I requested?
From: Sandy Ryza [mailto:[EMAIL PROTECTED]]
Sent: Thursday, October 03, 2013 12:32 PM
To: [EMAIL PROTECTED]
Subject: Re: Non data-local scheduling
Ah, I was going off the Fair Scheduler equivalent, didn't realize they were different. In that case you might try setting it to something like half the nodes in the cluster.
Nodes are constantly heartbeating to the Resource Manager. When a node heartbeats, the scheduler checks to see whether the node has any free space, and, if it does, offers it to an application. From the application's perspective, this offer is a "scheduling opportunity". Each application will pass up yarn.scheduler.capacity.node-locality-delay before placing a container on a non-local node.
On Thu, Oct 3, 2013 at 10:36 AM, André Hacker <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Thanks, but I can't set this to a fraction, it wants to see an integer.
My documentation is slightly different:
"Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers. Typically this should be set to number of racks in the cluster, this feature is disabled by default, set to -1."
I set it to 1 and now I had 33 data local and 11 rack local tasks, which is a better, but still not optimal.
Couldn't find a good description of what this feature means (what is a scheduling opportunity, how many are there?). It does not seem to be in the current documentation http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
2013/10/3 Sandy Ryza <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Try setting yarn.scheduler.capacity.node-locality-delay to a number between 0 and 1. This will turn on delay scheduling - here's the doc on how this works:
For applications that request containers on particular nodes, the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another node. Expressed as a float between 0 and 1, which, as a fraction of the cluster size, is the number of scheduling opportunities to pass up. The default value of -1.0 means don't pass up any scheduling opportunities.
On Thu, Oct 3, 2013 at 9:57 AM, André Hacker <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
I have a 25 node cluster, running hadoop 2.1.0-beta, with capacity scheduler (default settings for scheduler) and replication factor 3.
I have exclusive access to the cluster to run a benchmark job and I wonder why there are so few data-local and so many rack-local maps.
The input format calculates 44 input splits and 44 map tasks, however, it seems to be random how many of them are processed data locally. Here the counters of my last tries:
data-local / rack-local:
Test 1: data-local:15 rack-local: 29
Test 2: data-local:18 rack-local: 26
I don't understand why there is not always 100% data local. This should not be a problem since the blocks of my input file are distributed over all nodes.
Maybe someone can give me a hint.
André Hacker, TU Berlin