Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - OutOfMemoryError during reduce shuffle


+
Shivaram Lingamneni 2013-02-20, 06:45
Copy link to this message
-
Re: OutOfMemoryError during reduce shuffle
Hemanth Yamijala 2013-02-21, 01:41
There are a few tweaks In configuration that may help. Can you please look
at
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Shuffle%2FReduce+Parameters

Also, since you have mentioned reducers are unbalanced, could you use a
custom partitioner to balance out the outputs. Or just increase the number
of reducers so the load is spread out.

Thanks
Hemanth

On Wednesday, February 20, 2013, Shivaram Lingamneni wrote:

> I'm experiencing the following crash during reduce tasks:
>
> https://gist.github.com/slingamn/04ff3ff3412af23aa50d
>
> on Hadoop 1.0.3 (specifically I'm using Amazon's EMR, AMI version
> 2.2.1). The crash is triggered by especially unbalanced reducer
> inputs, i.e., when one reducer receives too many records. (The reduce
> task gets retried three times, but since the data is the same every
> time, it crashes each time in the same place and the job fails.)
>
> From the following links:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-1182
>
>
> http://hadoop-common.472056.n3.nabble.com/Shuffle-In-Memory-OutOfMemoryError-td433197.html
>
> it seems as though Hadoop is supposed to prevent this from happening
> by intelligently managing the amount of memory that is provided to the
> shuffle. However, I don't know how ironclad this guarantee is.
>
> Can anyone advise me on how robust I can expect Hadoop to be to this
> issue, in the face of highly unbalanced reducer inputs? Thanks very
> much for your time.
>
+
Shivaram Lingamneni 2013-02-22, 06:56