Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> why is num of map tasks gets overridden?


Copy link to this message
-
Re: why is num of map tasks gets overridden?
>
> Actually controlling the number of maps is subtle. The mapred.map.tasks
> parameter is just a hint to the InputFormat for the number of maps. The
> default InputFormat behavior is to split the total number of bytes into the
> right number of fragments. However, in the default case the DFS block size
> of the input files is treated as an upper bound for input splits. A lower
> bound on the split size can be set via mapred.min.split.size. Thus, if you
> expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k
> maps, unless your mapred.map.tasks is even larger. Ultimately the
> InputFormat<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html>determines the number of maps.
>

http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Bertrand

On Tue, Aug 21, 2012 at 2:19 PM, nutch buddy <[EMAIL PROTECTED]> wrote:

> I configure a job in hadoop ,set the number of map tasks in the code to 8.
>
> Then I run the job and it gets 152 map tasks. Can't get why its being
> overriden and whhere it get 152 from.
>
> The mapred-site.xml has 24 as mapred.map.tasks.
>
> any idea?
>

--
Bertrand Dechoux
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB