Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Why my tests shows Yarn is worse than MRv1 for terasort?


Copy link to this message
-
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Increasing the slowstart is not meant to increase performance, but should
make for a fairer comparison.  Have you tried making sure that in MR2 only
8 map tasks are running concurrently, or boosting MR1 up to 16?

-Sandy
On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang
<[EMAIL PROTECTED]>wrote:

> Changing mapreduce.job.reduce.
> slowstart.completedmaps to 0.99 does not look good. The map phase alone
> took 48 minutes and total time seems to be even longer. Any way to let map
> phase run faster?
>
> Thanks.
>
>
> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <[EMAIL PROTECTED]
> > wrote:
>
>> Thanks Sandy.
>>
>> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
>> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>> in MR1 both use the default value 0.05.
>>
>> I tried to allocate 1536MB and 1024MB to map container some time ago, but
>> the changes did not give me a better result, thus, I changed it back to
>> 768MB.
>>
>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
>> BTW, I should use
>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>
>> Also, in MR1 I can specify tasktracker.http.threads, but I could not find
>> the counterpart for MR2. Which one I should tune for the http thread?
>>
>> Thanks again.
>>
>>
>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <[EMAIL PROTECTED]>wrote:
>>
>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>> about three times as long in MR2 as they are in MR1.  This is probably
>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>> if you are oversubscribing your node's resources.  Because of the different
>>> way that MR1 and MR2 view resources, I think it's better to test with
>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>> phases will run separately.
>>>
>>> On the other side, it looks like your MR1 has more spilled records than
>>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>>> to .13, which should improve MR1 performance, but will provide a fairer
>>> comparison (MR2 automatically does this tuning for you).
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> The number of map slots and reduce slots on each data node for MR1 are
>>>> 8 and 3, respectively. Since MR2 could use all containers for either map or
>>>> reduce, I would expect that MR2 is faster.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>> tasks page).  Can you share the counters for MR1?
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>>>> to 2 times slowness. There should be something very wrong either in
>>>>>> configuration or code. Any hints?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>>
>>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>>
>>>>>>>
>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file