Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Why is Hadoop always running just 4 tasks?


Copy link to this message
-
Re: Why is Hadoop always running just 4 tasks?
mapred.map.tasks is rather a hint to InputFormat (
http://wiki.apache.org/hadoop/HowManyMapsAndReduces) and it is ignored in
your case.

You process gz files, and InputFormat has isSplitatble method that for gz
files it returns false, so that each map tasks process a whole file (this
is related with gz files - you can not uncompress a part of gzipped file.
To uncompress it, you must read it from the beginning to the end).
2013/12/11 Dror, Ittay <[EMAIL PROTECTED]>

> Thank you.
>
> The command is:
> hadoop jar /tmp/Algo-0.0.1.jar com.twitter.scalding.Tool com.akamai.Algo
> --hdfs --header --input /algo/input{0..3}.gz --output /algo/output
>
> Btw, the Hadoop version is 1.2.1
>
> Not sure what driver you are referring to.
> Regards,
> Ittay
>
> From: Mirko Kämpf <[EMAIL PROTECTED]>
> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Wednesday, December 11, 2013 6:21 PM
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: Re: Why is Hadoop always running just 4 tasks?
>
> Hi,
>
> what is the command you execute to submit the job?
> Please share also the driver code ....
>
> So we can troubleshoot better.
>
> Best wishes
> Mirko
>
>
>
>
> 2013/12/11 Dror, Ittay <[EMAIL PROTECTED]>
>
>> I have a cluster of 4 machines with 24 cores and 7 disks each.
>>
>> On each node I copied from local a file of 500G. So I have 4 files in
>> hdfs with many blocks. My replication factor is 1.
>>
>> I run a job (a scalding flow) and while there are 96 reducers pending,
>> there are only 4 active map tasks.
>>
>> What am I doing wrong? Below is the configuration
>>
>> Thanks,
>> Ittay
>>
>> <configuration>
>> <property>
>> <name>mapred.job.tracker</name>
>>  <value>master:54311</value>
>> </property>
>>
>> <property>
>>  <name>mapred.map.tasks</name>
>>  <value>96</value>
>> </property>
>>
>> <property>
>>  <name>mapred.reduce.tasks</name>
>>  <value>96</value>
>> </property>
>>
>> <property>
>> <name>mapred.local.dir</name>
>>
>> <value>/hdfs/0/mapred/local,/hdfs/1/mapred/local,/hdfs/2/mapred/local,/hdfs/3/mapred/local,/hdfs/4/mapred/local,/hdfs/5/mapred/local,/hdfs/6/mapred/local,/hdfs/7/mapred/local</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> <value>24</value>
>> </property>
>>
>> <property>
>>     <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>     <value>24</value>
>> </property>
>> </configuration>
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB