You said you have a large amount of data.
How large is that approximately?
Did you compress the intermediate data (with what codec)?
2011/1/7 Tali K <[EMAIL PROTECTED]>:
> According to the documentation, that parameter is for the number of
> tasks *per TaskTracker*. I am asking about the number of tasks
> for the entire job and entire cluster. That parameter is already
> set to 3, which is one less than the number of cores on each node's
> CPU, as recommended.In my question I stated that
> 82 tasks were run for the first job, yet only 4 for the second -
> both numbers being cluster-wide.
>> Date: Fri, 7 Jan 2011 13:19:42 -0800
>> Subject: Re: Help: How to increase amont maptasks per job ?
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>> Set higher values for mapred.tasktracker.map.tasks.maximum (and
>> mapred.tasktracker.reduce.tasks.maximum) in mapred-site.xml
>> On Fri, Jan 7, 2011 at 12:58 PM, Tali K <[EMAIL PROTECTED]> wrote:
>> > We have a jobs which runs in several map/reduce stages. In the first job,
>> > a large number of map tasks -82 are initiated, as expected.
>> > And that cause all nodes to be used.
>> > In a
>> > later job, where we are still dealing with large amounts of
>> > data, only 4 map tasks are initiated, and that caused to use only 4 nodes.
>> > This stage is actually the
>> > workhorse of the job, and requires much more processing power than the
>> > initial stage.
>> > We are trying to understand why only a few map tasks are
>> > being used, as we are not getting the full advantage of our cluster.
Met vriendelijke groeten,