Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> EC2, Max tasks, under utilized?


Copy link to this message
-
Re: EC2, Max tasks, under utilized?
Hello,
I should also point out that I'm using a SequenceFileInputFormat.

Regards
Saptarshi Guha
On Tue, Jun 23, 2009 at 10:43 AM, Saptarshi Guha
<[EMAIL PROTECTED]>wrote:

> Hello,
> I'm running a 90 node c1.xlarge cluster. No reducers,
> mapred.max.map.tasks=6 per machine.
> The AMI is own and uses Hadoop 0.19.1
> The dataset has 145K keys, and the processing time is huge.
>
> Now, when set the mapred.map.tasks=14,000 what ends up running is 49 map
> tasks, across the machines.
> No machine is running more than 3 tasks most are running 1, some are
> running 0.
> Looking at the map records read, it appears these 49 tasks  correspond to
> the 145k records.
> Q) Why? Why isn't  the running tasks a much higher number? If each machine
> can run 6, then why not make this a higher number and run across the
> machines?
> This is under utilization
>
> So I set the mapred.map.tasks=90.
> At the hadoop machine list, all 90 machines are at least 1 task , mostly 1,
> some 2 and a small few 3+(max 4)
> At the job tracker page, only 23 are running, 48 pending (when i sent this
> email).
> With 90 machines(and Map Task Capacity of 540), why aren't  90 running at
> one go?
>
> What should be set? What isn't set?
>
> Regards
> Saptarshi Guha
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB