I'm running Hadoop 1.0.4 on a modest cluster (~20 machines) and I would
like to divide my cluster resources by job's process time.
The jobs running on the cluster can be divided as follows:
1. Very short jobs: less then 1 minute.
2. Normal jobs: 2-3 minutes up to an hour or two.
3. Very long jobs: days of processing. (still not active and the reason for
my inquiries here).
I was thinking of using the Capacity Scheduler and divide the cluster
resources such that the long jobs can run without disturbing the other jobs
and the very short jobs won't be delayed.
>From what I understand, if a very long job is running and the cluster
resources are free, it will use them all (unless queue is upper bound) but
once a job from another queue starts, will it claim it's resources back
even though the very long job is still running ? or the lack of preemption
will prevent that ?
Any other advise about using the CapacityScheduler in that use case ?