Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Using CapacityScheduler to divide resources between jobs (not users)


Copy link to this message
-
Using CapacityScheduler to divide resources between jobs (not users)
Hi all,

I'm running Hadoop 1.0.4 on a modest cluster (~20 machines).
The jobs running on the cluster can be divided (resource wise) as follows:

1. Very short jobs: less then 1 minute.
2. Normal jobs: 2-3 minutes up to an hour or two.
3. Very long jobs: days of processing. (still not active and the reason for
my inquiries here).

I was thinking of using the CapacityScheduler and divide the cluster
resources so that the long jobs can run without disturbing the other jobs.
I read that such job queues should be upper bound as well since it may use
the entire cluster resources once it's free but since it takes a long time
to finish, it won't release them to other queues as it should. Is it so ?
Any advise about using the CapacityScheduler in that use case ?

Thanks, and sorry for re-sending this message.

Amit.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB