Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Why my tests shows Yarn is worse than MRv1 for terasort?


Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
The terasort execution log shows that reduce spent about 5.5 mins from 33%
to 35% as below.
13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
13/06/10 *08:02:46* INFO mapreduce.Job:  map 100% reduce 33%
13/06/10 *08:08:16* INFO mapreduce.Job:  map 100% reduce 35%
13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%

Any way, below are my configurations for your reference. Thanks!
*(A) core-site.xml*
only define 'fs.default.name' and 'hadoop.tmp.dir'

*(B) hdfs-site.xml*
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

  <property>
    <name>dfs.name.dir</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
  </property>

  <property>
    <name>dfs.data.dir</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
  </property>

  <property>
    <name>dfs.block.size</name>
    <value>134217728</value><!-- 128MB -->
  </property>

  <property>
    <name>dfs.namenode.handler.count</name>
    <value>64</value>
  </property>

  <property>
    <name>dfs.datanode.handler.count</name>
    <value>10</value>
  </property>

*(C) mapred-site.xml*
  <property>
    <name>mapreduce.cluster.temp.dir</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
    <description>No description</description>
    <final>true</final>
  </property>

  <property>
    <name>mapreduce.cluster.local.dir</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
    <description>No description</description>
    <final>true</final>
  </property>

<property>
  <name>mapreduce.child.java.opts</name>
  <value>-Xmx1000m</value>
</property>

<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
   </property>

 <property>
    <name>mapreduce.tasktracker.map.tasks.maximum</name>
    <value>8</value>
  </property>

  <property>
    <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
    <value>4</value>
  </property>
  <property>
    <name>mapreduce.tasktracker.outofband.heartbeat</name>
    <value>true</value>
  </property>

*(D) yarn-site.xml*
 <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>node1:18025</value>
    <description>host is the hostname of the resource manager and
    port is the port on which the NodeManagers contact the Resource Manager.
    </description>
  </property>

  <property>
    <description>The address of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>node1:18088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>node1:18030</value>
    <description>host is the hostname of the resourcemanager and port is
the port
    on which the Applications in the cluster talk to the Resource Manager.
    </description>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>node1:18040</value>
    <description>the host is the hostname of the ResourceManager and the
port is the port on
    which the clients can talk to the Resource Manager. </description>
  </property>

  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
    <description>the local directories used by the nodemanager</description>
  </property>

  <property>
    <name>yarn.nodemanager.address</name>
    <value>0.0.0.0:18050</value>
    <description>the nodemanagers bind to this port</description>
  </property>

  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>10240</value>
    <description>the amount of memory on the NodeManager in GB</description>
  </property>

  <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
    <description>directory on hdfs where the application logs are moved to
</description>
  </property>

   <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
    <description>the directories used by Nodemanagers as log
directories</description>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
    <description>shuffle service that needs to be set for Map Reduce to run
</description>
  </property>

  <property>
    <name>yarn.resourcemanager.client.thread-count</name>
    <value>64</value>
  </property>

 <property>
    <name>yarn.nodemanager.resource.cpu-cores</name>
    <value>24</value>
  </property>

<property>
    <name>yarn.nodemanager.vcores-pcores-ratio</name>
    <value>3</value>
  </property>

 <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>22000</value>
  </property>

 <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
  </property>

2013/6/7 Harsh J <[EMAIL PROTECTED]>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB