Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> EC2, Max tasks, under utilized?


Copy link to this message
-
Re: EC2, Max tasks, under utilized?
Hello,
I should also point out that I'm using a SequenceFileInputFormat.

Regards
Saptarshi Guha
On Tue, Jun 23, 2009 at 10:43 AM, Saptarshi Guha
<[EMAIL PROTECTED]>wrote:

> Hello,
> I'm running a 90 node c1.xlarge cluster. No reducers,
> mapred.max.map.tasks=6 per machine.
> The AMI is own and uses Hadoop 0.19.1
> The dataset has 145K keys, and the processing time is huge.
>
> Now, when set the mapred.map.tasks=14,000 what ends up running is 49 map
> tasks, across the machines.
> No machine is running more than 3 tasks most are running 1, some are
> running 0.
> Looking at the map records read, it appears these 49 tasks  correspond to
> the 145k records.
> Q) Why? Why isn't  the running tasks a much higher number? If each machine
> can run 6, then why not make this a higher number and run across the
> machines?
> This is under utilization
>
> So I set the mapred.map.tasks=90.
> At the hadoop machine list, all 90 machines are at least 1 task , mostly 1,
> some 2 and a small few 3+(max 4)
> At the job tracker page, only 23 are running, 48 pending (when i sent this
> email).
> With 90 machines(and Map Task Capacity of 540), why aren't  90 running at
> one go?
>
> What should be set? What isn't set?
>
> Regards
> Saptarshi Guha
>