I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows:
1. Very short jobs: less then 1 minute. 2. Normal jobs: 2-3 minutes up to an hour or two. 3. Very long jobs: days of processing. (still not active and the reason for my inquiries here).
I was thinking of using the CapacityScheduler and divide the cluster resources so that the long jobs can run without disturbing the other jobs. I read that such job queues should be upper bound as well since it may use the entire cluster resources once it's free but since it takes a long time to finish, it won't release them to other queues as it should. Is it so ? Any advise about using the CapacityScheduler in that use case ?
Thanks, and sorry for re-sending this message.
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext