-Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Why not to tune the configurations?
Both frameworks have many areas to tune:
- Combiners, Shuffle optimization, Block size, etc
2013/6/6 sam liu <[EMAIL PROTECTED]>
> Hi Experts,
> We are thinking about whether to use Yarn or not in the near future, and I
> ran teragen/terasort on Yarn and MRv1 for comprison.
> My env is three nodes cluster, and each node has similar hardware: 2 cpu(4
> core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To be
> fair, I did not make any performance tuning on their configurations, but
> use the default configuration values.
> Before testing, I think Yarn will be much better than MRv1, if they all
> use default configuration, because Yarn is a better framework than MRv1.
> However, the test result shows some differences:
> MRv1: Hadoop-1.1.1
> Yarn: Hadoop-2.0.4
> (A) Teragen: generate 10 GB data:
> - MRv1: 193 sec
> - Yarn: 69 sec
> *Yarn is 2.8 times better than MRv1*
> (B) Terasort: sort 10 GB data:
> - MRv1: 451 sec
> - Yarn: 1136 sec
> *Yarn is 2.5 times worse than MRv1*
> After a fast analysis, I think the direct cause might be that Yarn is much
> faster than MRv1 on Map phase, but much worse on Reduce phase.
> Here I have two questions:
> *- Why my tests shows Yarn is worse than MRv1 for terasort?
> *- What's the stratage for tuning Yarn performance? Is any materials?*
Marcos Ortiz Valmaseda
Product Manager at PDVSA