Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> controlling no. of mapper tasks

Copy link to this message
Re: controlling no. of mapper tasks
Yes, that is correct.  It is indeed looking at the data size.  Please
carefully read through again what I wrote - particularly the part about
files getting broken into chunks (aka "blocks").  If you want fewer map
tasks, then store your files in HDFS with a larger block size.  They
will then get stored in fewer blocks/chunks, and will result in fewer
map tasks per job.


On 06/20/2011 03:44 PM, [EMAIL PROTECTED] wrote:
> Hi David, I think Hadoop is looking at the data size, not the no. of
> input files. If I pass in .gz files, then yes hadoop is choosing 1
> map task per file but if I pass in HUGE text file or same file split
> into 10 files, its choosing same no. of maps tasks (191 in my case).
> Thanks Praveen
> -----Original Message----- From: ext David Rosenstrauch
> [mailto:[EMAIL PROTECTED]] Sent: Monday, June 20, 2011 3:39 PM To:
> [EMAIL PROTECTED] Subject: Re: controlling no. of
> mapper tasks
> On 06/20/2011 03:24 PM, [EMAIL PROTECTED] wrote:
>> Hi there, I know client can send "mapred.reduce.tasks" to specify
>> no. of reduce tasks and hadoop honours it but "mapred.map.tasks" is
>> not honoured by Hadoop. Is there any way to control number of map
>> tasks? What I noticed is that Hadoop is choosing too many mappers
>> and there is an extra overhead being added due to this. For
>> example, when I have only 10 map tasks, my job finishes faster than
>> when Hadoop chooses 191 map tasks. I have 5 slave cluster and 10
>> tasks can run in parallel. I want to set both map and reduce tasks
>> to be 10 for max efficiency.
>> Thanks Praveen
> The number of map tasks is determined dynamically based on the number
> of input chunks you have.  If you want fewer map tasks either pass
> fewer input files to your job, or store the files using larger chunk
> sizes (which will result in fewer chunks per file, and thus fewer
> chunks total).
> HTH,
> DR