Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: Why my tests shows Yarn is worse than MRv1 for terasort?


Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Sandy Ryza 2013-10-22, 23:45
It looks like many of your reduce tasks were killed.  Do you know why?
 Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
MR1 with JVM reuse turned off.

-Sandy
On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <[EMAIL PROTECTED]>wrote:

> The Terasort output for MR2 is as follows.
>
> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
> Counters: 46
>         File System Counters
>                 FILE: Number of bytes read=456102049355
>                 FILE: Number of bytes written=897246250517
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=1000000851200
>                 HDFS: Number of bytes written=1000000000000
>                 HDFS: Number of read operations=32131
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=224
>         Job Counters
>                 Killed map tasks=1
>                 Killed reduce tasks=20
>                 Launched map tasks=7601
>                 Launched reduce tasks=132
>                 Data-local map tasks=7591
>                 Rack-local map tasks=10
>                 Total time spent by all maps in occupied slots
> (ms)=1696141311
>                 Total time spent by all reduces in occupied slots
> (ms)=2664045096
>         Map-Reduce Framework
>                 Map input records=10000000000
>                 Map output records=10000000000
>                 Map output bytes=1020000000000
>                 Map output materialized bytes=440486356802
>                 Input split bytes=851200
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=10000000000
>                 Reduce shuffle bytes=440486356802
>                 Reduce input records=10000000000
>                 Reduce output records=10000000000
>                 Spilled Records=20000000000
>                 Shuffled Maps =851200
>                 Failed Shuffles=61
>                 Merged Map outputs=851200
>                 GC time elapsed (ms)=4215666
>                 CPU time spent (ms)=192433000
>                 Physical memory (bytes) snapshot=3349356380160
>                 Virtual memory (bytes) snapshot=9665208745984
>                 Total committed heap usage (bytes)=3636854259712
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=4
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=1000000000000
>         File Output Format Counters
>                 Bytes Written=1000000000000
>
> Thanks,
>
> John
>
>
>
> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>
>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
>> turned out that the terasort for MR2 is 2 times slower than that in MR1. I
>> cannot really believe it.
>>
>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>> configurations are as follows.
>>
>>         mapreduce.map.java.opts = "-Xmx512m";
>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>         mapreduce.map.memory.mb = "768";
>>         mapreduce.reduce.memory.mb = "2048";
>>
>>         yarn.scheduler.minimum-allocation-mb = "256";
>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>         yarn.nodemanager.resource.memory-mb = "12288";
>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>
>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>         mapreduce.task.io.sort.factor = "48";
>>         mapreduce.task.io.sort.mb = "200";
>>         mapreduce.map.speculative = "true";
>>         mapreduce.reduce.speculative = "true";
>>         mapreduce.framework.name = "yarn";
>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";