Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - why is num of map tasks gets overridden?


Copy link to this message
-
Re: why is num of map tasks gets overridden?
Bertrand Dechoux 2012-08-21, 12:52
>
> Actually controlling the number of maps is subtle. The mapred.map.tasks
> parameter is just a hint to the InputFormat for the number of maps. The
> default InputFormat behavior is to split the total number of bytes into the
> right number of fragments. However, in the default case the DFS block size
> of the input files is treated as an upper bound for input splits. A lower
> bound on the split size can be set via mapred.min.split.size. Thus, if you
> expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k
> maps, unless your mapred.map.tasks is even larger. Ultimately the
> InputFormat<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html>determines the number of maps.
>

http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Bertrand

On Tue, Aug 21, 2012 at 2:19 PM, nutch buddy <[EMAIL PROTECTED]> wrote:

> I configure a job in hadoop ,set the number of map tasks in the code to 8.
>
> Then I run the job and it gets 152 map tasks. Can't get why its being
> overriden and whhere it get 152 from.
>
> The mapred-site.xml has 24 as mapred.map.tasks.
>
> any idea?
>

--
Bertrand Dechoux