Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Why is Hadoop always running just 4 tasks?


Copy link to this message
-
Re: Why is Hadoop always running just 4 tasks?
mapred.map.tasks is rather a hint to InputFormat (
http://wiki.apache.org/hadoop/HowManyMapsAndReduces) and it is ignored in
your case.

You process gz files, and InputFormat has isSplitatble method that for gz
files it returns false, so that each map tasks process a whole file (this
is related with gz files - you can not uncompress a part of gzipped file.
To uncompress it, you must read it from the beginning to the end).
2013/12/11 Dror, Ittay <[EMAIL PROTECTED]>

> Thank you.
>
> The command is:
> hadoop jar /tmp/Algo-0.0.1.jar com.twitter.scalding.Tool com.akamai.Algo
> --hdfs --header --input /algo/input{0..3}.gz --output /algo/output
>
> Btw, the Hadoop version is 1.2.1
>
> Not sure what driver you are referring to.
> Regards,
> Ittay
>
> From: Mirko Kämpf <[EMAIL PROTECTED]>
> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Wednesday, December 11, 2013 6:21 PM
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: Re: Why is Hadoop always running just 4 tasks?
>
> Hi,
>
> what is the command you execute to submit the job?
> Please share also the driver code ....
>
> So we can troubleshoot better.
>
> Best wishes
> Mirko
>
>
>
>
> 2013/12/11 Dror, Ittay <[EMAIL PROTECTED]>
>
>> I have a cluster of 4 machines with 24 cores and 7 disks each.
>>
>> On each node I copied from local a file of 500G. So I have 4 files in
>> hdfs with many blocks. My replication factor is 1.
>>
>> I run a job (a scalding flow) and while there are 96 reducers pending,
>> there are only 4 active map tasks.
>>
>> What am I doing wrong? Below is the configuration
>>
>> Thanks,
>> Ittay
>>
>> <configuration>
>> <property>
>> <name>mapred.job.tracker</name>
>>  <value>master:54311</value>
>> </property>
>>
>> <property>
>>  <name>mapred.map.tasks</name>
>>  <value>96</value>
>> </property>
>>
>> <property>
>>  <name>mapred.reduce.tasks</name>
>>  <value>96</value>
>> </property>
>>
>> <property>
>> <name>mapred.local.dir</name>
>>
>> <value>/hdfs/0/mapred/local,/hdfs/1/mapred/local,/hdfs/2/mapred/local,/hdfs/3/mapred/local,/hdfs/4/mapred/local,/hdfs/5/mapred/local,/hdfs/6/mapred/local,/hdfs/7/mapred/local</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> <value>24</value>
>> </property>
>>
>> <property>
>>     <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>     <value>24</value>
>> </property>
>> </configuration>
>>
>
>