Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Why my tests shows Yarn is worse than MRv1 for terasort?


Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
The terasort execution log shows that reduce spent about 5.5 mins from 33%
to 35% as below.
13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
13/06/10 *08:02:46* INFO mapreduce.Job:  map 100% reduce 33%
13/06/10 *08:08:16* INFO mapreduce.Job:  map 100% reduce 35%
13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%

Any way, below are my configurations for your reference. Thanks!
*(A) core-site.xml*
only define 'fs.default.name' and 'hadoop.tmp.dir'

*(B) hdfs-site.xml*
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

  <property>
    <name>dfs.name.dir</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
  </property>

  <property>
    <name>dfs.data.dir</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
  </property>

  <property>
    <name>dfs.block.size</name>
    <value>134217728</value><!-- 128MB -->
  </property>

  <property>
    <name>dfs.namenode.handler.count</name>
    <value>64</value>
  </property>

  <property>
    <name>dfs.datanode.handler.count</name>
    <value>10</value>
  </property>

*(C) mapred-site.xml*
  <property>
    <name>mapreduce.cluster.temp.dir</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
    <description>No description</description>
    <final>true</final>
  </property>

  <property>
    <name>mapreduce.cluster.local.dir</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
    <description>No description</description>
    <final>true</final>
  </property>

<property>
  <name>mapreduce.child.java.opts</name>
  <value>-Xmx1000m</value>
</property>

<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
   </property>

 <property>
    <name>mapreduce.tasktracker.map.tasks.maximum</name>
    <value>8</value>
  </property>

  <property>
    <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
    <value>4</value>
  </property>
  <property>
    <name>mapreduce.tasktracker.outofband.heartbeat</name>
    <value>true</value>
  </property>

*(D) yarn-site.xml*
 <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>node1:18025</value>
    <description>host is the hostname of the resource manager and
    port is the port on which the NodeManagers contact the Resource Manager.
    </description>
  </property>

  <property>
    <description>The address of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>node1:18088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>node1:18030</value>
    <description>host is the hostname of the resourcemanager and port is
the port
    on which the Applications in the cluster talk to the Resource Manager.
    </description>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>node1:18040</value>
    <description>the host is the hostname of the ResourceManager and the
port is the port on
    which the clients can talk to the Resource Manager. </description>
  </property>

  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
    <description>the local directories used by the nodemanager</description>
  </property>

  <property>
    <name>yarn.nodemanager.address</name>
    <value>0.0.0.0:18050</value>
    <description>the nodemanagers bind to this port</description>
  </property>

  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>10240</value>
    <description>the amount of memory on the NodeManager in GB</description>
  </property>

  <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
    <description>directory on hdfs where the application logs are moved to
</description>
  </property>

   <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
    <description>the directories used by Nodemanagers as log
directories</description>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
    <description>shuffle service that needs to be set for Map Reduce to run
</description>
  </property>

  <property>
    <name>yarn.resourcemanager.client.thread-count</name>
    <value>64</value>
  </property>

 <property>
    <name>yarn.nodemanager.resource.cpu-cores</name>
    <value>24</value>
  </property>

<property>
    <name>yarn.nodemanager.vcores-pcores-ratio</name>
    <value>3</value>
  </property>

 <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>22000</value>
  </property>

 <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
  </property>

2013/6/7 Harsh J <[EMAIL PROTECTED]>