Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: How to lower the total number of map tasks


+
Romedius Weiss 2012-10-03, 04:00
Copy link to this message
-
Re: How to lower the total number of map tasks

I have followed a suggestion on the given link, and set  mapred.min.split.size to 134217728.

With the above mapred.min.split.size, I get    mapred.map.tasks =121 (previously it was 242).

Thanks for all the replies !

Shing

________________________________
 From: Romedius Weiss <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Wednesday, October 3, 2012 5:00 AM
Subject: Re: How to lower the total number of map tasks
 
Hi!

According to the article @YDN*
"The on-node parallelism is controlled by the  mapred.tasktracker.map.tasks.maximum parameter."

[http://developer.yahoo.com/hadoop/tutorial/module4.html]

Also i think its better to set the min size instead of teh max size, so the algorithm tries to slice the file in chunks of a certian minimal size.

Have you tried to make a custom InputFormat? Might be another more drastic solution.

Cheers, R
Zitat von Shing Hing Man <[EMAIL PROTECTED]>:

> I only have one big input file.
>
> Shing
>
>
> ________________________________
>  From: Bejoy KS <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]>
> Sent: Tuesday, October 2, 2012 6:46 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Hi Shing
>
> Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ________________________________
>
> From:  Shing Hing Man <[EMAIL PROTECTED]>
> Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
> To: [EMAIL PROTECTED]<[EMAIL PROTECTED]>
> ReplyTo:  [EMAIL PROTECTED]
> Subject: Re: How to lower the total number of map tasks
>
>
> I have tried
>        Configuration.setInt("mapred.max.split.size",134217728);
>
> and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).
>
> But in the job.xml, I am still getting mapred.map.tasks =242 .
>
> Shing
>
>
>
>
>
>
> ________________________________
>  From: Bejoy Ks <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]>
> Sent: Tuesday, October 2, 2012 6:03 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Sorry for the typo, the property name is mapred.max.split.size
>
> Also just for changing the number of map tasks you don't need to modify the hdfs block size.
>
>
> On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
>
> Hi
>>
>>
>> You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>>
>>
>>
>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <[EMAIL PROTECTED]> wrote:
>>
>>
>>>
>>>
>>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>> When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>>
>>>
>>> mapred.map.tasks =242
>>> mapred.min.split.size =0
>>> dfs.block.size = 67108864
>>>
>>>
>>> I would like to reduce   mapred.map.tasks to see if it improves performance.
>>> I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>> Is there a way to reduce  mapred.map.tasks  ?
>>>
>>>
>>> Thanks in advance for any assistance !  
>>> Shing
>>>
>>>
>>