I should also point out that I'm using a SequenceFileInputFormat.
On Tue, Jun 23, 2009 at 10:43 AM, Saptarshi Guha
> I'm running a 90 node c1.xlarge cluster. No reducers,
> mapred.max.map.tasks=6 per machine.
> The AMI is own and uses Hadoop 0.19.1
> The dataset has 145K keys, and the processing time is huge.
> Now, when set the mapred.map.tasks=14,000 what ends up running is 49 map
> tasks, across the machines.
> No machine is running more than 3 tasks most are running 1, some are
> running 0.
> Looking at the map records read, it appears these 49 tasks correspond to
> the 145k records.
> Q) Why? Why isn't the running tasks a much higher number? If each machine
> can run 6, then why not make this a higher number and run across the
> This is under utilization
> So I set the mapred.map.tasks=90.
> At the hadoop machine list, all 90 machines are at least 1 task , mostly 1,
> some 2 and a small few 3+(max 4)
> At the job tracker page, only 23 are running, 48 pending (when i sent this
> With 90 machines(and Map Task Capacity of 540), why aren't 90 running at
> one go?
> What should be set? What isn't set?
> Saptarshi Guha