Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Shuffle In Memory OutOfMemoryError


Copy link to this message
-
RE: Shuffle In Memory OutOfMemoryError

  Thanks Christopher.  

  The heap size for reduce tasks is configured to be 640M ( mapred.child.java.opts set to -Xmx640m ).

  Andy

-----Original Message-----
From: Christopher Douglas [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, March 09, 2010 5:19 PM
To: [EMAIL PROTECTED]
Subject: Re: Shuffle In Memory OutOfMemoryError

No, MR-1182 is included in 0.20.2

What heap size have you set for your reduce tasks? -C

Sent from my iPhone

On Mar 9, 2010, at 2:34 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote:

> Andy:
> You need to manually apply the patch.
>
> Cheers
>
> On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins <[EMAIL PROTECTED]
> >wrote:
>
>>
>>  Thanks Ted.  My understanding is that MAPREDUCE-1182 is included  
>> in the
>> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and  
>> re-ran
>> the same job scenarios.  Running with mapred.reduce.parallel.copies  
>> set to 1
>> and continue to have the same Java heap space error.
>>
>>
>>
>> -----Original Message-----
>> From: Ted Yu [mailto:[EMAIL PROTECTED]]
>> Sent: Tuesday, March 09, 2010 12:56 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>
>> This issue has been resolved in
>> http://issues.apache.org/jira/browse/MAPREDUCE-1182
>>
>> Please apply the patch
>> M1182-1v20.patch<
>> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch
>> >
>>
>> On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins <[EMAIL PROTECTED]
>>> wrote:
>>
>>>
>>> Thanks Ted.  Very helpful.  You are correct that I misunderstood the
>> code
>>> at ReduceTask.java:1535.  I missed the fact that it's in a  
>>> IOException
>> catch
>>> block.  My mistake.  That's what I get for being in a rush.
>>>
>>> For what it's worth I did re-run the job with
>>> mapred.reduce.parallel.copies set with values from 5 all the way  
>>> down to
>> 1.
>>> All failed with the same error:
>>>
>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>       at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>       at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>       at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>       at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
>> (ReduceTask.java:1195)
>>>
>>>
>>>  So from that it does seem like something else might be going on,  
>>> yes?
>> I
>>> need to do some more research.
>>>
>>> I appreciate your insights.
>>>
>>> Andy
>>>
>>> -----Original Message-----
>>> From: Ted Yu [mailto:[EMAIL PROTECTED]]
>>> Sent: Sunday, March 07, 2010 3:38 PM
>>> To: [EMAIL PROTECTED]
>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>
>>> My observation is based on this call chain:
>>> MapOutputCopier.run() calling copyOutput() calling getMapOutput()  
>>> calling
>>> ramManager.canFitInMemory(decompressedLength)
>>>
>>> Basically ramManager.canFitInMemory() makes decision without  
>>> considering
>>> the
>>> number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of  
>>> total
>> heap
>>> may be used in shuffling if default parameters were used.
>>> Of course, you should check the value for  
>>> mapred.reduce.parallel.copies
>> to
>>> see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
>>>
>>> About ramManager.unreserve() call, ReduceTask.java from hadoop  
>>> 0.20.2
>> only
>>> has 2731 lines. So I have to guess the location of the code  
>>> snippet you
>>> provided.
>>> I found this around line 1535:
>>>       } catch (IOException ioe) {
>>>         LOG.info("Failed to shuffle from " +
>>> mapOutputLoc.getTaskAttemptId(),
>>>                  ioe);
>>>
>>>         // Inform the ram-manager
>>>         ramManager.closeInMemoryFile(mapOutputLength);
>>>         ramManager.unreserve(mapOutputLength);
>>>
>>>         // Discard the map-output