Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Capacity Scheduler questions


Copy link to this message
-
Capacity Scheduler questions
We are evaluating the Capacity Scheduler…

We would like to configure the equivalent of Fair Scheduler
userMaxJobsDefault = 1 (i.e. we would like to limit a user to a single job
in the cluster).

·         By default the Capacity Scheduler allows multiple jobs from a
single user to run concurrently.

·         From
http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/
there
appear to be limits for “the number of accepted/active jobs per user”.
However, the example capacity-scheduler.xml only has limits for active
tasks e.g. <queue>.maximum-initialized-active-tasks-per-user property.

·         Also the source CapacitySchedulerConf.java includes the following
code which suggests that the maximum jobs per user can be configured via
the init-accept-jobs-factor property. However, this is not clear from the
description of this property.

*  public int getInitToAcceptJobsFactor(String queue) {*

*    int initToAccepFactor =*

*      rmConf.getInt(toFullPropertyName(queue, "init-accept-jobs-factor"),*

*          defaultInitToAcceptJobsFactor);*

*    if(initToAccepFactor <= 0) {*

*      throw new IllegalArgumentException(*

*          "Invalid maximum jobs per user configuration " +
initToAccepFactor);*

*    }*

*    return initToAccepFactor;*

*  }*

·         Also, other posts and sample xml files on the web refer to
mapred.capacity-scheduler.default-maximum-initialized-jobs-per-user
property. However, I’ve tried setting this to 1 but it has no impact.

So… how can we configure the Capacity Scheduler to limit a user to a single
job in the cluster?

Thanks,

            Stuart
Also, I’m curious… a benefit of the Capacity Scheduler is that resource
limits can be specified in percentage terms, so if the cluster size changed
the CS configuration would not have to change. Therefore, why are some
properties specified in terms of tasks e.g.
mapred.capacity-scheduler.queue.<queue>.maximum-initialized-active-tasks-per-user
which would need to be reconfigured if the cluster size changed?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB