Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> Why Yarn has worse performance for terasort, than MRv1?

Copy link to this message
Why Yarn has worse performance for terasort, than MRv1?
Hi Experts,

We are thinking about whether to use Yarn or not in the near future, and I
ran teragen/terasort on Yarn and MRv1 for comprison.

My env is three nodes cluster, and each node has similar hardware: 2 cpu(4
core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To be
fair, I did not make any performance tuning on their configurations, but
use the default configuration values.

Before testing, I think Yarn will be much better than MRv1, if they all use
default configuration, because Yarn is a better framework than MRv1.
However, the test result shows some differences:

MRv1: Hadoop-1.1.1
Yarn: Hadoop-2.0.4

(A) Teragen: generate 10 GB data:
- MRv1: 193 sec
- Yarn: 69 sec
*Yarn is 2.8 times better than MRv1*

(B) Terasort: sort 10 GB data:
- MRv1: 451 sec
- Yarn: 1136 sec
*Yarn is 2.5 times worse than MRv1*

After a fast analysis, I think the direct cause might be that Yarn is much
faster than MRv1 on Map phase, but much worse on Reduce phase.

Here I have two questions:
*- Why my tests show Yarn is worse than MRv1 for terasort?
*- What's the stratage for tuning Yarn performance? Is any materials?*


Sam Liu