Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> number of mapper tasks


Copy link to this message
-
Re: number of mapper tasks
Sorry for asking too many questions, but the answers are really happening.
2013/1/28 Harsh J <[EMAIL PROTECTED]>

> This seems CPU-oriented. You probably want the NLineInputFormat? See
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html
> .
> This should let you spawn more maps as we, based on your N factor.
>

Indeed, CPU is my bottleneck. That's why I want more things in parallel.
Actually, I wrote my own InputFormat, to be able to process multiline CSVs:
https://github.com/mvallebr/CSVInputFormat
I could change it to read several lines at a time, but would this alone
allow more tasks running in parallel?
> Not really - "Slots" are capacities, rather than split factors
> themselves. You can have N slots always available, but your job has to
> supply as many map tasks (based on its input/needs/etc.) to use them
> up.
>

But how can I do that (supply map tasks) in my job? changing its code?
hadoop config?
> Unless your job sets the number of reducers to 0 manually, 1 default
> reducer is always run that waits to see if it has any outputs from
> maps. If it does not receive any outputs after maps have all
> completed, it dies out with behavior equivalent to a NOP.
>
Ok, I did job.setNumReduceTasks(0); , guess this will solve this part,
thanks!

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB