Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Why my tests shows Yarn is worse than MRv1 for terasort?


+
sam liu 2013-06-07, 03:15
+
Sandy Ryza 2013-06-07, 06:53
Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Hey Sam,

Did you get a chance to retry with Sandy's suggestions? The config
appears to be asking NMs to use roughly 22 total containers (as
opposed to 12 total tasks in MR1 config) due to a 22 GB memory
resource. This could impact much, given the CPU is still the same for
both test runs.

On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <[EMAIL PROTECTED]> wrote:
> Hey Sam,
>
> Thanks for sharing your results.  I'm definitely curious about what's
> causing the difference.
>
> A couple observations:
> It looks like you've got yarn.nodemanager.resource.memory-mb in there twice
> with two different values.
>
> Your max JVM memory of 1000 MB is (dangerously?) close to the default
> mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks getting
> killed for running over resource limits?
>
> -Sandy
>
>
> On Thu, Jun 6, 2013 at 10:21 PM, sam liu <[EMAIL PROTECTED]> wrote:
>>
>> The terasort execution log shows that reduce spent about 5.5 mins from 33%
>> to 35% as below.
>> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>
>> Any way, below are my configurations for your reference. Thanks!
>> (A) core-site.xml
>> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>
>> (B) hdfs-site.xml
>>   <property>
>>     <name>dfs.replication</name>
>>     <value>1</value>
>>   </property>
>>
>>   <property>
>>     <name>dfs.name.dir</name>
>>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>   </property>
>>
>>   <property>
>>     <name>dfs.data.dir</name>
>>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>   </property>
>>
>>   <property>
>>     <name>dfs.block.size</name>
>>     <value>134217728</value><!-- 128MB -->
>>   </property>
>>
>>   <property>
>>     <name>dfs.namenode.handler.count</name>
>>     <value>64</value>
>>   </property>
>>
>>   <property>
>>     <name>dfs.datanode.handler.count</name>
>>     <value>10</value>
>>   </property>
>>
>> (C) mapred-site.xml
>>   <property>
>>     <name>mapreduce.cluster.temp.dir</name>
>>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>     <description>No description</description>
>>     <final>true</final>
>>   </property>
>>
>>   <property>
>>     <name>mapreduce.cluster.local.dir</name>
>>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>     <description>No description</description>
>>     <final>true</final>
>>   </property>
>>
>> <property>
>>   <name>mapreduce.child.java.opts</name>
>>   <value>-Xmx1000m</value>
>> </property>
>>
>> <property>
>>     <name>mapreduce.framework.name</name>
>>     <value>yarn</value>
>>    </property>
>>
>>  <property>
>>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>     <value>8</value>
>>   </property>
>>
>>   <property>
>>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>     <value>4</value>
>>   </property>
>>
>>
>>   <property>
>>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>     <value>true</value>
>>   </property>
>>
>> (D) yarn-site.xml
>>  <property>
>>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>     <value>node1:18025</value>
>>     <description>host is the hostname of the resource manager and
>>     port is the port on which the NodeManagers contact the Resource
>> Manager.
>>     </description>
>>   </property>
>>
>>   <property>
>>     <description>The address of the RM web application.</description>
>>     <name>yarn.resourcemanager.webapp.address</name>
>>     <value>node1:18088</value>
>>   </property>
>>
>>
>>   <property>
>>     <name>yarn.resourcemanager.scheduler.address</name>
>>     <value>node1:18030</value>
>>     <description>host is the hostname of the resourcemanager and port is

Harsh J
+
sam liu 2013-06-09, 08:51
+
Harsh J 2013-06-09, 13:03
+
sam liu 2013-06-18, 08:58
+
Michel Segel 2013-06-18, 10:11
+
Sandy Ryza 2013-10-22, 23:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB