Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Why my tests shows Yarn is worse than MRv1 for terasort?


Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
sam liu 2013-06-07, 03:34
Thanks very much! And I agree and believe that Arun and their team could be
a very good help for it.

Does any expert can give more comments/analysis on my above tests?

Thanks in advance!
2013/6/7 Marcos Luis Ortiz Valmaseda <[EMAIL PROTECTED]>

> I´m a not an expert tuning YARN, but you can try Terasort, doing something
> similar with MRv1 and YARN.
> I thnik that Arun and their team could be a very good help for it.
> Some links?
>
> http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
> http://www.slideshare.net/tungld/terasort
> http://sortbenchmark.org/
>
> http://www.mapr.com/press-release/mapr-and-google-compute-engine-set-new-world-record-for-hadoop-terasort
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html
>
> It would be nice that if you do this, share your results in a blog post or
> in a research article, to spread the word about your findings.
>
> Best wishes.
>
>
> 2013/6/6 sam liu <[EMAIL PROTECTED]>
>
>> At the begining, I just want to do a fast comparision of MRv1 and Yarn.
>> But they have many differences, and to be fair for comparison I did not
>> tune their configurations at all.  So I got above test results. After
>> analyzing the test result, no doubt, I will configure them and do
>> comparison again.
>>
>> Do you have any idea on current test result? I think, to compare with
>> MRv1, Yarn is better on Map phase(teragen test), but worse on Reduce
>> phase(terasort test).
>> And any detailed suggestions/comments/materials on Yarn performance
>> tunning?
>>
>> Thanks!
>>
>>
>> 2013/6/7 Marcos Luis Ortiz Valmaseda <[EMAIL PROTECTED]>
>>
>>> Why not to tune the configurations?
>>> Both frameworks have many areas to tune:
>>> - Combiners, Shuffle optimization, Block size, etc
>>>
>>>
>>>
>>> 2013/6/6 sam liu <[EMAIL PROTECTED]>
>>>
>>>> Hi Experts,
>>>>
>>>> We are thinking about whether to use Yarn or not in the near future,
>>>> and I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>
>>>> My env is three nodes cluster, and each node has similar hardware: 2
>>>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To
>>>> be fair, I did not make any performance tuning on their configurations, but
>>>> use the default configuration values.
>>>>
>>>> Before testing, I think Yarn will be much better than MRv1, if they all
>>>> use default configuration, because Yarn is a better framework than MRv1.
>>>> However, the test result shows some differences:
>>>>
>>>> MRv1: Hadoop-1.1.1
>>>> Yarn: Hadoop-2.0.4
>>>>
>>>> (A) Teragen: generate 10 GB data:
>>>> - MRv1: 193 sec
>>>> - Yarn: 69 sec
>>>> *Yarn is 2.8 times better than MRv1*
>>>>
>>>> (B) Terasort: sort 10 GB data:
>>>> - MRv1: 451 sec
>>>> - Yarn: 1136 sec
>>>> *Yarn is 2.5 times worse than MRv1*
>>>>
>>>> After a fast analysis, I think the direct cause might be that Yarn is
>>>> much faster than MRv1 on Map phase, but much worse on Reduce phase.
>>>>
>>>> Here I have two questions:
>>>> *- Why my tests shows Yarn is worse than MRv1 for terasort?
>>>> *
>>>> *- What's the stratage for tuning Yarn performance? Is any materials?*
>>>>
>>>> Thanks!
>>>>
>>>
>>>
>>>
>>> --
>>> Marcos Ortiz Valmaseda
>>> Product Manager at PDVSA
>>> http://about.me/marcosortiz
>>>
>>>
>>
>
>
> --
> Marcos Ortiz Valmaseda
> Product Manager at PDVSA
> http://about.me/marcosortiz
>
>