Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> controlling no. of mapper tasks


Copy link to this message
-
Re: controlling no. of mapper tasks
Yes, that is correct.  It is indeed looking at the data size.  Please
carefully read through again what I wrote - particularly the part about
files getting broken into chunks (aka "blocks").  If you want fewer map
tasks, then store your files in HDFS with a larger block size.  They
will then get stored in fewer blocks/chunks, and will result in fewer
map tasks per job.

DR

On 06/20/2011 03:44 PM, [EMAIL PROTECTED] wrote:
> Hi David, I think Hadoop is looking at the data size, not the no. of
> input files. If I pass in .gz files, then yes hadoop is choosing 1
> map task per file but if I pass in HUGE text file or same file split
> into 10 files, its choosing same no. of maps tasks (191 in my case).
>
> Thanks Praveen
>
> -----Original Message----- From: ext David Rosenstrauch
> [mailto:[EMAIL PROTECTED]] Sent: Monday, June 20, 2011 3:39 PM To:
> [EMAIL PROTECTED] Subject: Re: controlling no. of
> mapper tasks
>
> On 06/20/2011 03:24 PM, [EMAIL PROTECTED] wrote:
>> Hi there, I know client can send "mapred.reduce.tasks" to specify
>> no. of reduce tasks and hadoop honours it but "mapred.map.tasks" is
>> not honoured by Hadoop. Is there any way to control number of map
>> tasks? What I noticed is that Hadoop is choosing too many mappers
>> and there is an extra overhead being added due to this. For
>> example, when I have only 10 map tasks, my job finishes faster than
>> when Hadoop chooses 191 map tasks. I have 5 slave cluster and 10
>> tasks can run in parallel. I want to set both map and reduce tasks
>> to be 10 for max efficiency.
>>
>> Thanks Praveen
>
> The number of map tasks is determined dynamically based on the number
> of input chunks you have.  If you want fewer map tasks either pass
> fewer input files to your job, or store the files using larger chunk
> sizes (which will result in fewer chunks per file, and thus fewer
> chunks total).
>
> HTH,
>
> DR
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB