Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: How to lower the total number of map tasks


Copy link to this message
-
Re: How to lower the total number of map tasks
Hi!

According to the article @YDN*
"The on-node parallelism is controlled by the  
mapred.tasktracker.map.tasks.maximum parameter."

[http://developer.yahoo.com/hadoop/tutorial/module4.html]

Also i think its better to set the min size instead of teh max size,  
so the algorithm tries to slice the file in chunks of a certian  
minimal size.

Have you tried to make a custom InputFormat? Might be another more  
drastic solution.

Cheers, R
Zitat von Shing Hing Man <[EMAIL PROTECTED]>:

> I only have one big input file.
>
> Shing
>
>
> ________________________________
>  From: Bejoy KS <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]>
> Sent: Tuesday, October 2, 2012 6:46 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Hi Shing
>
> Is your input a single file or set of small files? If latter you  
> need to use CombineFileInputFormat.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ________________________________
>
> From:  Shing Hing Man <[EMAIL PROTECTED]>
> Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
> To: [EMAIL PROTECTED]<[EMAIL PROTECTED]>
> ReplyTo:  [EMAIL PROTECTED]
> Subject: Re: How to lower the total number of map tasks
>
>
> I have tried
>        Configuration.setInt("mapred.max.split.size",134217728);
>
> and setting mapred.max.split.size in mapred-site.xml. (  
> dfs.block.size is left unchanged at 67108864).
>
> But in the job.xml, I am still getting mapred.map.tasks =242 .
>
> Shing
>
>
>
>
>
>
> ________________________________
>  From: Bejoy Ks <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; Shing Hing Man <[EMAIL PROTECTED]>
> Sent: Tuesday, October 2, 2012 6:03 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Sorry for the typo, the property name is mapred.max.split.size
>
> Also just for changing the number of map tasks you don't need to  
> modify the hdfs block size.
>
>
> On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <[EMAIL PROTECTED]> wrote:
>
> Hi
>>
>>
>> You need to alter the value of mapred.max.split size to a value  
>> larger than your block size to have less number of map tasks than  
>> the default.
>>
>>
>>
>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <[EMAIL PROTECTED]> wrote:
>>
>>
>>>
>>>
>>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>> When I  submit a map/reduce job to process a file of  size about  
>>> 16 GB, in job.xml, I have the following
>>>
>>>
>>> mapred.map.tasks =242
>>> mapred.min.split.size =0
>>> dfs.block.size = 67108864
>>>
>>>
>>> I would like to reduce   mapred.map.tasks to see if it improves  
>>> performance.
>>> I have tried doubling  the size of  dfs.block.size. But  
>>> the    mapred.map.tasks remains unchanged.
>>> Is there a way to reduce  mapred.map.tasks  ?
>>>
>>>
>>> Thanks in advance for any assistance !  
>>> Shing
>>>
>>>
>>
+
Shing Hing Man 2012-10-03, 13:50
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB