Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: Why my tests shows Yarn is worse than MRv1 for terasort?


Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Marcos Luis Ortiz Valmase... 2013-06-07, 03:05
Why not to tune the configurations?
Both frameworks have many areas to tune:
- Combiners, Shuffle optimization, Block size, etc

2013/6/6 sam liu <[EMAIL PROTECTED]>

> Hi Experts,
>
> We are thinking about whether to use Yarn or not in the near future, and I
> ran teragen/terasort on Yarn and MRv1 for comprison.
>
> My env is three nodes cluster, and each node has similar hardware: 2 cpu(4
> core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To be
> fair, I did not make any performance tuning on their configurations, but
> use the default configuration values.
>
> Before testing, I think Yarn will be much better than MRv1, if they all
> use default configuration, because Yarn is a better framework than MRv1.
> However, the test result shows some differences:
>
> MRv1: Hadoop-1.1.1
> Yarn: Hadoop-2.0.4
>
> (A) Teragen: generate 10 GB data:
> - MRv1: 193 sec
> - Yarn: 69 sec
> *Yarn is 2.8 times better than MRv1*
>
> (B) Terasort: sort 10 GB data:
> - MRv1: 451 sec
> - Yarn: 1136 sec
> *Yarn is 2.5 times worse than MRv1*
>
> After a fast analysis, I think the direct cause might be that Yarn is much
> faster than MRv1 on Map phase, but much worse on Reduce phase.
>
> Here I have two questions:
> *- Why my tests shows Yarn is worse than MRv1 for terasort?
> *
> *- What's the stratage for tuning Yarn performance? Is any materials?*
>
> Thanks!
>

--
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz