Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: Why my tests shows Yarn is worse than MRv1 for terasort?


Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Harsh J 2013-06-08, 15:09
Hey Sam,

Did you get a chance to retry with Sandy's suggestions? The config
appears to be asking NMs to use roughly 22 total containers (as
opposed to 12 total tasks in MR1 config) due to a 22 GB memory
resource. This could impact much, given the CPU is still the same for
both test runs.

On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <[EMAIL PROTECTED]> wrote:
> Hey Sam,
>
> Thanks for sharing your results.  I'm definitely curious about what's
> causing the difference.
>
> A couple observations:
> It looks like you've got yarn.nodemanager.resource.memory-mb in there twice
> with two different values.
>
> Your max JVM memory of 1000 MB is (dangerously?) close to the default
> mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks getting
> killed for running over resource limits?
>
> -Sandy
>
>
> On Thu, Jun 6, 2013 at 10:21 PM, sam liu <[EMAIL PROTECTED]> wrote:
>>
>> The terasort execution log shows that reduce spent about 5.5 mins from 33%
>> to 35% as below.
>> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>
>> Any way, below are my configurations for your reference. Thanks!
>> (A) core-site.xml
>> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>
>> (B) hdfs-site.xml
>>   <property>
>>     <name>dfs.replication</name>
>>     <value>1</value>
>>   </property>
>>
>>   <property>
>>     <name>dfs.name.dir</name>
>>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>   </property>
>>
>>   <property>
>>     <name>dfs.data.dir</name>
>>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>   </property>
>>
>>   <property>
>>     <name>dfs.block.size</name>
>>     <value>134217728</value><!-- 128MB -->
>>   </property>
>>
>>   <property>
>>     <name>dfs.namenode.handler.count</name>
>>     <value>64</value>
>>   </property>
>>
>>   <property>
>>     <name>dfs.datanode.handler.count</name>
>>     <value>10</value>
>>   </property>
>>
>> (C) mapred-site.xml
>>   <property>
>>     <name>mapreduce.cluster.temp.dir</name>
>>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>     <description>No description</description>
>>     <final>true</final>
>>   </property>
>>
>>   <property>
>>     <name>mapreduce.cluster.local.dir</name>
>>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>     <description>No description</description>
>>     <final>true</final>
>>   </property>
>>
>> <property>
>>   <name>mapreduce.child.java.opts</name>
>>   <value>-Xmx1000m</value>
>> </property>
>>
>> <property>
>>     <name>mapreduce.framework.name</name>
>>     <value>yarn</value>
>>    </property>
>>
>>  <property>
>>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>     <value>8</value>
>>   </property>
>>
>>   <property>
>>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>     <value>4</value>
>>   </property>
>>
>>
>>   <property>
>>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>     <value>true</value>
>>   </property>
>>
>> (D) yarn-site.xml
>>  <property>
>>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>     <value>node1:18025</value>
>>     <description>host is the hostname of the resource manager and
>>     port is the port on which the NodeManagers contact the Resource
>> Manager.
>>     </description>
>>   </property>
>>
>>   <property>
>>     <description>The address of the RM web application.</description>
>>     <name>yarn.resourcemanager.webapp.address</name>
>>     <value>node1:18088</value>
>>   </property>
>>
>>
>>   <property>
>>     <name>yarn.resourcemanager.scheduler.address</name>
>>     <value>node1:18030</value>
>>     <description>host is the hostname of the resourcemanager and port is

Harsh J