Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> number of mapper tasks


+
Marcelo Elias Del Valle 2013-01-28, 15:54
+
Harsh J 2013-01-28, 16:02
+
Marcelo Elias Del Valle 2013-01-28, 16:31
+
Harsh J 2013-01-28, 16:41
+
Marcelo Elias Del Valle 2013-01-28, 16:55
Copy link to this message
-
Re: number of mapper tasks
Just to complement the last question, I have implemented the getSplits
method in my input format:
https://github.com/mvallebr/CSVInputFormat/blob/master/src/main/java/org/apache/hadoop/mapreduce/lib/input/CSVNLineInputFormat.java

However, it still doesn't create more than 2 map tasks. Is there something
I could do about it to assure more map tasks are created?

Thanks
Marcelo.
2013/1/28 Marcelo Elias Del Valle <[EMAIL PROTECTED]>

> Sorry for asking too many questions, but the answers are really happening.
>
>
> 2013/1/28 Harsh J <[EMAIL PROTECTED]>
>
>> This seems CPU-oriented. You probably want the NLineInputFormat? See
>>
>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html
>> .
>> This should let you spawn more maps as we, based on your N factor.
>>
>
> Indeed, CPU is my bottleneck. That's why I want more things in parallel.
> Actually, I wrote my own InputFormat, to be able to process multiline
> CSVs: https://github.com/mvallebr/CSVInputFormat
> I could change it to read several lines at a time, but would this alone
> allow more tasks running in parallel?
>
>
>> Not really - "Slots" are capacities, rather than split factors
>> themselves. You can have N slots always available, but your job has to
>> supply as many map tasks (based on its input/needs/etc.) to use them
>> up.
>>
>
> But how can I do that (supply map tasks) in my job? changing its code?
> hadoop config?
>
>
>> Unless your job sets the number of reducers to 0 manually, 1 default
>> reducer is always run that waits to see if it has any outputs from
>> maps. If it does not receive any outputs after maps have all
>> completed, it dies out with behavior equivalent to a NOP.
>>
> Ok, I did job.setNumReduceTasks(0); , guess this will solve this part,
> thanks!
>
>
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr
>

--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr
+
Vinod Kumar Vavilapalli 2013-01-29, 02:08
+
Marcelo Elias Del Valle 2013-01-29, 10:52
+
Marcelo Elias Del Valle 2013-01-29, 12:53
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB