Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # user - Why Yarn has worse performance for terasort, than MRv1?


Copy link to this message
-
Why Yarn has worse performance for terasort, than MRv1?
sam liu 2013-06-07, 02:11
Hi Experts,

We are thinking about whether to use Yarn or not in the near future, and I
ran teragen/terasort on Yarn and MRv1 for comprison.

My env is three nodes cluster, and each node has similar hardware: 2 cpu(4
core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To be
fair, I did not make any performance tuning on their configurations, but
use the default configuration values.

Before testing, I think Yarn will be much better than MRv1, if they all use
default configuration, because Yarn is a better framework than MRv1.
However, the test result shows some differences:

MRv1: Hadoop-1.1.1
Yarn: Hadoop-2.0.4

(A) Teragen: generate 10 GB data:
- MRv1: 193 sec
- Yarn: 69 sec
*Yarn is 2.8 times better than MRv1*

(B) Terasort: sort 10 GB data:
- MRv1: 451 sec
- Yarn: 1136 sec
*Yarn is 2.5 times worse than MRv1*

After a fast analysis, I think the direct cause might be that Yarn is much
faster than MRv1 on Map phase, but much worse on Reduce phase.

Here I have two questions:
*- Why my tests show Yarn is worse than MRv1 for terasort?
*
*- What's the stratage for tuning Yarn performance? Is any materials?*

Thanks!
--

Sam Liu