Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Shuffle In Memory OutOfMemoryError


Copy link to this message
-
Re: Shuffle In Memory OutOfMemoryError
I don't think this OOM is a framework bug per se, and given the  
rewrite/refactoring of the shuffle in MAPREDUCE-318 (in 0.21), tuning  
the 0.20 shuffle semantics is likely not worthwhile (though data  
informing improvements to trunk would be excellent). Most likely (and  
tautologically), ReduceTask simply requires more memory than is  
available and the job failure can be avoided by either 0) increasing  
the heap size or 1) lowering mapred.shuffle.input.buffer.percent. Most  
of the tasks we run have a heap of 1GB. For a reduce fetching >200k  
map outputs, that's a reasonable, even stingy amount of space. -C

On Mar 10, 2010, at 5:26 AM, Ted Yu wrote:

> I verified that size and maxSize are long. This means MR-1182 didn't  
> resolve
> Andy's issue.
>
> According to Andy:
> At the beginning of the job there are 209,754 pending map tasks and 32
> pending reduce tasks
>
> My guess is that GC wasn't reclaiming memory fast enough, leading to  
> OOME
> because of large number of in-memory shuffle candidates.
>
> My suggestion for Andy would be to:
> 1. add -*verbose*:*gc as JVM parameter
> 2. modify reserve() slightly to calculate the maximum outstanding
> numPendingRequests and print the maximum.
>
> Based on the output from above two items, we can discuss solution.
> My intuition is to place upperbound on numPendingRequests beyond which
> canFitInMemory() returns false.
> *
> My two cents.
>
> On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas
> <[EMAIL PROTECTED]>wrote:
>
>> That section of code is unmodified in MR-1182. See the patches/svn  
>> log. -C
>>
>> Sent from my iPhone
>>
>>
>> On Mar 9, 2010, at 7:44 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote:
>>
>> I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
>>> This is what I see in ReduceTask (line 999):
>>>    public synchronized boolean reserve(int requestedSize,  
>>> InputStream in)
>>>
>>>    throws InterruptedException {
>>>      // Wait till the request can be fulfilled...
>>>      while ((size + requestedSize) > maxSize) {
>>>
>>> I don't see the fix from MR-1182.
>>>
>>> That's why I suggested to Andy that he manually apply MR-1182.
>>>
>>> Cheers
>>>
>>> On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins <[EMAIL PROTECTED]
>>>> wrote:
>>>
>>>
>>>> Thanks Christopher.
>>>>
>>>> The heap size for reduce tasks is configured to be 640M (
>>>> mapred.child.java.opts set to -Xmx640m ).
>>>>
>>>> Andy
>>>>
>>>> -----Original Message-----
>>>> From: Christopher Douglas [mailto:[EMAIL PROTECTED]]
>>>> Sent: Tuesday, March 09, 2010 5:19 PM
>>>> To: [EMAIL PROTECTED]
>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>>
>>>> No, MR-1182 is included in 0.20.2
>>>>
>>>> What heap size have you set for your reduce tasks? -C
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Mar 9, 2010, at 2:34 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Andy:
>>>>> You need to manually apply the patch.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins <
>>>>>
>>>> [EMAIL PROTECTED]
>>>>
>>>>> wrote:
>>>>>>
>>>>>
>>>>>
>>>>>> Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
>>>>>> in the
>>>>>> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend  
>>>>>> and
>>>>>> re-ran
>>>>>> the same job scenarios.  Running with  
>>>>>> mapred.reduce.parallel.copies
>>>>>> set to 1
>>>>>> and continue to have the same Java heap space error.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ted Yu [mailto:[EMAIL PROTECTED]]
>>>>>> Sent: Tuesday, March 09, 2010 12:56 PM
>>>>>> To: [EMAIL PROTECTED]
>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>>>>
>>>>>> This issue has been resolved in
>>>>>> http://issues.apache.org/jira/browse/MAPREDUCE-1182
>>>>>>
>>>>>> Please apply the patch
>>>>>> M1182-1v20.patch<
>>>>>>
>>>>>>
>>>> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch
>>>>
>>>>>
>>>>>>>
>>>>>> On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins <