On Jun 3, 2010, at 1:45 AM, Alex Munteanu wrote:
> I am running several different mapreduce jobs. For some of them it is
> better to have a rather high number of running map tasks per node,
> whereas others do very intensive read operations on our database
> resulting in read timeouts. So for these jobs I'd like to set a much
> smaller limit of concurrently running map tasks.
IIRC, 0.21+capacity scheduler has some capabilities that might be useful here. You can set a (global) default heap size per task. For those jobs that you want to limit, you can double the size and the capacity scheduler will only schedule one of those tasks instead of two.