Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Help: How to increase amont maptasks per job ?


Copy link to this message
-
Re: Help: How to increase amont maptasks per job ?
You said you have a large amount of data.
How large is that approximately?
Did you compress the intermediate data (with what codec)?

Niels

2011/1/7 Tali K <[EMAIL PROTECTED]>:
>
> According to the documentation, that parameter is for the number of
>    tasks *per TaskTracker*.  I am asking about the number of tasks
>    for the entire job and entire cluster.  That parameter is already
>    set to 3, which is one less than the number of cores on each node's
>    CPU, as recommended.In my question I stated   that
>    82 tasks were run for the first job, yet only 4 for the second -
>    both numbers being cluster-wide.
>
>
>
>> Date: Fri, 7 Jan 2011 13:19:42 -0800
>> Subject: Re: Help: How to increase amont maptasks per job ?
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>>
>> Set higher values for mapred.tasktracker.map.tasks.maximum (and
>> mapred.tasktracker.reduce.tasks.maximum) in mapred-site.xml
>>
>> On Fri, Jan 7, 2011 at 12:58 PM, Tali K <[EMAIL PROTECTED]> wrote:
>>
>> >
>> >
>> >
>> >
>> > We have a jobs which runs in several map/reduce stages.  In the first job,
>> > a large number of map tasks -82  are initiated, as expected.
>> > And that cause all nodes to be used.
>> >  In a
>> > later job, where we are still dealing with large amounts of
>> >  data, only 4 map tasks are initiated, and that caused to use only 4 nodes.
>> > This stage is actually the
>> > workhorse of the job, and requires much more processing power than the
>> > initial stage.
>> >  We are trying to understand why only a few map tasks are
>> > being used, as we are not getting the full advantage of our cluster.
>> >
>> >
>> >
>> >
>

--
Met vriendelijke groeten,

Niels Basjes
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB