Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Why is Hadoop always running just 4 tasks?


Copy link to this message
-
Re: Why is Hadoop always running just 4 tasks?
I am not sure if Hadoop detects that. I guess that it will run one map
tasks for them. Please let me know, if I am wrong.
2013/12/11 Dror, Ittay <[EMAIL PROTECTED]>

> OK, thank you for the solution.
>
> BTW I just concatenated several .gz files together with cat  (without
> uncompressing first). So they should each uncompress individually
>
>
>
> From: Adam Kawa <[EMAIL PROTECTED]>
> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Wednesday, December 11, 2013 9:33 PM
>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: Re: Why is Hadoop always running just 4 tasks?
>
> mapred.map.tasks is rather a hint to InputFormat (
> http://wiki.apache.org/hadoop/HowManyMapsAndReduces) and it is ignored in
> your case.
>
> You process gz files, and InputFormat has isSplitatble method that for gz
> files it returns false, so that each map tasks process a whole file (this
> is related with gz files - you can not uncompress a part of gzipped file.
> To uncompress it, you must read it from the beginning to the end).
>
>
>
>
> 2013/12/11 Dror, Ittay <[EMAIL PROTECTED]>
>
>> Thank you.
>>
>> The command is:
>> hadoop jar /tmp/Algo-0.0.1.jar com.twitter.scalding.Tool com.akamai.Algo
>> --hdfs --header --input /algo/input{0..3}.gz --output /algo/output
>>
>> Btw, the Hadoop version is 1.2.1
>>
>> Not sure what driver you are referring to.
>> Regards,
>> Ittay
>>
>> From: Mirko Kämpf <[EMAIL PROTECTED]>
>> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Date: Wednesday, December 11, 2013 6:21 PM
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Subject: Re: Why is Hadoop always running just 4 tasks?
>>
>> Hi,
>>
>> what is the command you execute to submit the job?
>> Please share also the driver code ....
>>
>> So we can troubleshoot better.
>>
>> Best wishes
>> Mirko
>>
>>
>>
>>
>> 2013/12/11 Dror, Ittay <[EMAIL PROTECTED]>
>>
>>> I have a cluster of 4 machines with 24 cores and 7 disks each.
>>>
>>> On each node I copied from local a file of 500G. So I have 4 files in
>>> hdfs with many blocks. My replication factor is 1.
>>>
>>> I run a job (a scalding flow) and while there are 96 reducers pending,
>>> there are only 4 active map tasks.
>>>
>>> What am I doing wrong? Below is the configuration
>>>
>>> Thanks,
>>> Ittay
>>>
>>> <configuration>
>>> <property>
>>> <name>mapred.job.tracker</name>
>>>  <value>master:54311</value>
>>> </property>
>>>
>>> <property>
>>>  <name>mapred.map.tasks</name>
>>>  <value>96</value>
>>> </property>
>>>
>>> <property>
>>>  <name>mapred.reduce.tasks</name>
>>>  <value>96</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.local.dir</name>
>>>
>>> <value>/hdfs/0/mapred/local,/hdfs/1/mapred/local,/hdfs/2/mapred/local,/hdfs/3/mapred/local,/hdfs/4/mapred/local,/hdfs/5/mapred/local,/hdfs/6/mapred/local,/hdfs/7/mapred/local</value>
>>> </property>
>>>
>>> <property>
>>> <name>mapred.tasktracker.map.tasks.maximum</name>
>>> <value>24</value>
>>> </property>
>>>
>>> <property>
>>>     <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>>     <value>24</value>
>>> </property>
>>> </configuration>
>>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB