Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Why my tests shows Yarn is worse than MRv1 for terasort?


Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
I´m a not an expert tuning YARN, but you can try Terasort, doing something
similar with MRv1 and YARN.
I thnik that Arun and their team could be a very good help for it.
Some links?
http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
http://www.slideshare.net/tungld/terasort
http://sortbenchmark.org/
http://www.mapr.com/press-release/mapr-and-google-compute-engine-set-new-world-record-for-hadoop-terasort
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html

It would be nice that if you do this, share your results in a blog post or
in a research article, to spread the word about your findings.

Best wishes.
2013/6/6 sam liu <[EMAIL PROTECTED]>

> At the begining, I just want to do a fast comparision of MRv1 and Yarn.
> But they have many differences, and to be fair for comparison I did not
> tune their configurations at all.  So I got above test results. After
> analyzing the test result, no doubt, I will configure them and do
> comparison again.
>
> Do you have any idea on current test result? I think, to compare with
> MRv1, Yarn is better on Map phase(teragen test), but worse on Reduce
> phase(terasort test).
> And any detailed suggestions/comments/materials on Yarn performance
> tunning?
>
> Thanks!
>
>
> 2013/6/7 Marcos Luis Ortiz Valmaseda <[EMAIL PROTECTED]>
>
>> Why not to tune the configurations?
>> Both frameworks have many areas to tune:
>> - Combiners, Shuffle optimization, Block size, etc
>>
>>
>>
>> 2013/6/6 sam liu <[EMAIL PROTECTED]>
>>
>>> Hi Experts,
>>>
>>> We are thinking about whether to use Yarn or not in the near future, and
>>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>
>>> My env is three nodes cluster, and each node has similar hardware: 2
>>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To
>>> be fair, I did not make any performance tuning on their configurations, but
>>> use the default configuration values.
>>>
>>> Before testing, I think Yarn will be much better than MRv1, if they all
>>> use default configuration, because Yarn is a better framework than MRv1.
>>> However, the test result shows some differences:
>>>
>>> MRv1: Hadoop-1.1.1
>>> Yarn: Hadoop-2.0.4
>>>
>>> (A) Teragen: generate 10 GB data:
>>> - MRv1: 193 sec
>>> - Yarn: 69 sec
>>> *Yarn is 2.8 times better than MRv1*
>>>
>>> (B) Terasort: sort 10 GB data:
>>> - MRv1: 451 sec
>>> - Yarn: 1136 sec
>>> *Yarn is 2.5 times worse than MRv1*
>>>
>>> After a fast analysis, I think the direct cause might be that Yarn is
>>> much faster than MRv1 on Map phase, but much worse on Reduce phase.
>>>
>>> Here I have two questions:
>>> *- Why my tests shows Yarn is worse than MRv1 for terasort?
>>> *
>>> *- What's the stratage for tuning Yarn performance? Is any materials?*
>>>
>>> Thanks!
>>>
>>
>>
>>
>> --
>> Marcos Ortiz Valmaseda
>> Product Manager at PDVSA
>> http://about.me/marcosortiz
>>
>>
>
--
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB