|
|
-
Capacity Scheduler questionss d 2013-01-06, 21:31
We are evaluating the Capacity Scheduler…
We would like to configure the equivalent of Fair Scheduler userMaxJobsDefault = 1 (i.e. we would like to limit a user to a single job in the cluster). · By default the Capacity Scheduler allows multiple jobs from a single user to run concurrently. · From http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/ there appear to be limits for “the number of accepted/active jobs per user”. However, the example capacity-scheduler.xml only has limits for active tasks e.g. <queue>.maximum-initialized-active-tasks-per-user property. · Also the source CapacitySchedulerConf.java includes the following code which suggests that the maximum jobs per user can be configured via the init-accept-jobs-factor property. However, this is not clear from the description of this property. * public int getInitToAcceptJobsFactor(String queue) {* * int initToAccepFactor =* * rmConf.getInt(toFullPropertyName(queue, "init-accept-jobs-factor"),* * defaultInitToAcceptJobsFactor);* * if(initToAccepFactor <= 0) {* * throw new IllegalArgumentException(* * "Invalid maximum jobs per user configuration " + initToAccepFactor);* * }* * return initToAccepFactor;* * }* · Also, other posts and sample xml files on the web refer to mapred.capacity-scheduler.default-maximum-initialized-jobs-per-user property. However, I’ve tried setting this to 1 but it has no impact. So… how can we configure the Capacity Scheduler to limit a user to a single job in the cluster? Thanks, Stuart Also, I’m curious… a benefit of the Capacity Scheduler is that resource limits can be specified in percentage terms, so if the cluster size changed the CS configuration would not have to change. Therefore, why are some properties specified in terms of tasks e.g. mapred.capacity-scheduler.queue.<queue>.maximum-initialized-active-tasks-per-user which would need to be reconfigured if the cluster size changed? |