-EC2, Max tasks, under utilized?
Saptarshi Guha 2009-06-23, 14:43
I'm running a 90 node c1.xlarge cluster. No reducers, mapred.max.map.tasks=6
The AMI is own and uses Hadoop 0.19.1
The dataset has 145K keys, and the processing time is huge.
Now, when set the mapred.map.tasks=14,000 what ends up running is 49 map
tasks, across the machines.
No machine is running more than 3 tasks most are running 1, some are running
Looking at the map records read, it appears these 49 tasks correspond to
the 145k records.
Q) Why? Why isn't the running tasks a much higher number? If each machine
can run 6, then why not make this a higher number and run across the
This is under utilization
So I set the mapred.map.tasks=90.
At the hadoop machine list, all 90 machines are at least 1 task , mostly 1,
some 2 and a small few 3+(max 4)
At the job tracker page, only 23 are running, 48 pending (when i sent this
With 90 machines(and Map Task Capacity of 540), why aren't 90 running at
What should be set? What isn't set?