Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Capacity Scheduler questions


Copy link to this message
-
Re: Capacity Scheduler questions
> We would like to configure the equivalent of Fair Scheduler
> userMaxJobsDefault = 1 (i.e. we would like to limit a user to a single job
> in the cluster).
>
>
>
> ·         By default the Capacity Scheduler allows multiple jobs from a
> single user to run concurrently.
>
>
>
> ·         From
> http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/ there
> appear to be limits for “the number of accepted/active jobs per user”.
> However, the example capacity-scheduler.xml only has limits for active
> tasks e.g. <queue>.maximum-initialized-active-tasks-per-user property.
>
>
>
> ·         Also the source CapacitySchedulerConf.java includes the
> following code which suggests that the maximum jobs per user can be
> configured via the init-accept-jobs-factor property. However, this is not
> clear from the description of this property.
>
>
>
> *  public int getInitToAcceptJobsFactor(String queue) {*
>
> *    int initToAccepFactor =*
>
> *      rmConf.getInt(toFullPropertyName(queue, "init-accept-jobs-factor"),
> *
>
> *          defaultInitToAcceptJobsFactor);*
>
> *    if(initToAccepFactor <= 0) {*
>
> *      throw new IllegalArgumentException(*
>
> *          "Invalid maximum jobs per user configuration " +
> initToAccepFactor);*
>
> *    }*
>
> *    return initToAccepFactor;*
>
> *  }*
>
>
Single job in the whole cluster for the user or a shared queue? For a
specific user or for every user?

Handling specific users should be via special queues.

If you want it for every user in every shared queue, you can use
init-accept-jobs-factor:
    int maxSystemJobs = conf.getMaxSystemJobs();
    float capacityPercent = conf.getCapacity(queueName);
    int maxJobsPerUserToInit       (int)Math.ceil(maxSystemJobs * capacityPercent/100.0 * ulMin/100.0);
    int jobInitToAcceptFactor = conf.getInitToAcceptJobsFactor(queueName);
    int maxJobsPerUserToAccept = maxJobsPerUserToInit *
jobInitToAcceptFactor;

So, if max-system jobs is 1000, and the queue capacity is 10%, assuming
ulimit 100% (one user at a time), maxJobsPerUserToInit will be 100 and so
you will need to set jobInit factor for that queue to be 100. If the user
limit is say 20% (max of 5 users at a time), then maxJobsPerUserToInit will
be 20 and so you will need to set jobInit factor to be 20. It's a bit
complicated but achievable.

Also see http://hadoop.apache.org/docs/stable/capacity_scheduler.html for
reference.

> ·         Also, other posts and sample xml files on the web refer to
> mapred.capacity-scheduler.default-maximum-initialized-jobs-per-user
> property. However, I’ve tried setting this to 1 but it has no impact.
>
>

This configuration is used to control 'initialized' jobs per user.
Initialized jobs occupy more memory in JobTracker than non-initialized
ones, so this property is used to control the max number of inited jobs.

Can you state your original requirement? You don't want any user to run
more than one job at a time? Why?

Also, I’m curious… a benefit of the Capacity Scheduler is that resource
> limits can be specified in percentage terms, so if the cluster size changed
> the CS configuration would not have to change. Therefore, why are some
> properties specified in terms of tasks e.g.
> mapred.capacity-scheduler.queue.<queue>.maximum-initialized-active-tasks-per-user
> which would need to be reconfigured if the cluster size changed?

Some of these limits help control static resources like JobTracker memory
and so are not a function of cluster capacity.
HTH
--
+Vinod
Hortonworks Inc.
http://hortonworks.com/