Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Why my tests shows Yarn is worse than MRv1 for terasort?


+
Sandy Ryza 2013-10-23, 16:40
+
Jian Fang 2013-10-23, 17:05
+
Jian Fang 2013-10-23, 19:55
+
Jian Fang 2013-10-23, 20:46
+
Jian Fang 2013-10-23, 21:16
+
Jian Fang 2013-10-23, 23:26
Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
It looks like many of your reduce tasks were killed.  Do you know why?
 Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
MR1 with JVM reuse turned off.

-Sandy
On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <[EMAIL PROTECTED]>wrote:

> The Terasort output for MR2 is as follows.
>
> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
> Counters: 46
>         File System Counters
>                 FILE: Number of bytes read=456102049355
>                 FILE: Number of bytes written=897246250517
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=1000000851200
>                 HDFS: Number of bytes written=1000000000000
>                 HDFS: Number of read operations=32131
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=224
>         Job Counters
>                 Killed map tasks=1
>                 Killed reduce tasks=20
>                 Launched map tasks=7601
>                 Launched reduce tasks=132
>                 Data-local map tasks=7591
>                 Rack-local map tasks=10
>                 Total time spent by all maps in occupied slots
> (ms)=1696141311
>                 Total time spent by all reduces in occupied slots
> (ms)=2664045096
>         Map-Reduce Framework
>                 Map input records=10000000000
>                 Map output records=10000000000
>                 Map output bytes=1020000000000
>                 Map output materialized bytes=440486356802
>                 Input split bytes=851200
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=10000000000
>                 Reduce shuffle bytes=440486356802
>                 Reduce input records=10000000000
>                 Reduce output records=10000000000
>                 Spilled Records=20000000000
>                 Shuffled Maps =851200
>                 Failed Shuffles=61
>                 Merged Map outputs=851200
>                 GC time elapsed (ms)=4215666
>                 CPU time spent (ms)=192433000
>                 Physical memory (bytes) snapshot=3349356380160
>                 Virtual memory (bytes) snapshot=9665208745984
>                 Total committed heap usage (bytes)=3636854259712
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=4
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=1000000000000
>         File Output Format Counters
>                 Bytes Written=1000000000000
>
> Thanks,
>
> John
>
>
>
> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>
>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
>> turned out that the terasort for MR2 is 2 times slower than that in MR1. I
>> cannot really believe it.
>>
>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>> configurations are as follows.
>>
>>         mapreduce.map.java.opts = "-Xmx512m";
>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>         mapreduce.map.memory.mb = "768";
>>         mapreduce.reduce.memory.mb = "2048";
>>
>>         yarn.scheduler.minimum-allocation-mb = "256";
>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>         yarn.nodemanager.resource.memory-mb = "12288";
>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>
>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>         mapreduce.task.io.sort.factor = "48";
>>         mapreduce.task.io.sort.mb = "200";
>>         mapreduce.map.speculative = "true";
>>         mapreduce.reduce.speculative = "true";
>>         mapreduce.framework.name = "yarn";
>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
+
Jian Fang 2013-10-23, 00:50
+
Jian Fang 2013-10-23, 07:23
+
Sandy Ryza 2013-10-23, 15:17
+
Jian Fang 2013-10-23, 16:20
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB