Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> increase number of map tasks


Copy link to this message
-
Re: increase number of map tasks
Hi Satish
      What is your value for mapred.max.split.size? Try setting these
values as well
mapred.min.split.size=0 (it is the default value)
mapred.max.split.size=40

Try executing your job once you apply these changes on top of others you
did.

Regards
Bejoy.K.S

On Mon, Jan 9, 2012 at 10:16 PM, sset <[EMAIL PROTECTED]> wrote:

>
> Hello,
>
> In hdfs we have set block size - 40bytes . Input Data set is as below
> terminated with line feed.
>
> data1   (5*8=40 bytes)
> data2
> ......
> .......
> data10
>
>
> But still we see only 2 map tasks spawned, should have been atleast 10 map
> tasks. Each mapper performs complex mathematical computation. Not sure how
> works internally. Line feed does not work. Even with below settings map
> tasks never goes beyound 2, any way to make this spawn 10 tasks. Basically
> it should look like compute grid - computation in parallel.
>
> <property>
>  <name>io.bytes.per.checksum</name>
>  <value>30</value>
>  <description>The number of bytes per checksum.  Must not be larger than
>  io.file.buffer.size.</description>
> </property>
>
> <property>
>  <name>dfs.block.size</name>
>   <value>30</value>
>  <description>The default block size for new files.</description>
> </property>
>
> <property>
>  <name>mapred.tasktracker.map.tasks.maximum</name>
>  <value>10</value>
>  <description>The maximum number of map tasks that will be run
>  simultaneously by a task tracker.
>  </description>
> </property>
>
> single node with high configuration -> 8 cpus and 8gb memory. Hence taking
> an example of 10 data items with line feeds. We want to utilize full power
> of machine - hence want at least 10 map tasks - each task needs to perform
> highly complex mathematical simulation.  At present it looks like file data
> is the only way to specify number of map tasks via splitsize (in bytes) -
> but I prefer some criteria like line feed or whatever.
>
> How do we get 10 map tasks from above configuration - pls help.
>
> thanks
>
> --
> View this message in context:
> http://old.nabble.com/increase-number-of-map-tasks-tp33107775p33107775.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB