Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> OutOfMemoryError during reduce shuffle

Copy link to this message
Re: OutOfMemoryError during reduce shuffle
There are a few tweaks In configuration that may help. Can you please look

Also, since you have mentioned reducers are unbalanced, could you use a
custom partitioner to balance out the outputs. Or just increase the number
of reducers so the load is spread out.


On Wednesday, February 20, 2013, Shivaram Lingamneni wrote:

> I'm experiencing the following crash during reduce tasks:
> https://gist.github.com/slingamn/04ff3ff3412af23aa50d
> on Hadoop 1.0.3 (specifically I'm using Amazon's EMR, AMI version
> 2.2.1). The crash is triggered by especially unbalanced reducer
> inputs, i.e., when one reducer receives too many records. (The reduce
> task gets retried three times, but since the data is the same every
> time, it crashes each time in the same place and the job fails.)
> From the following links:
> https://issues.apache.org/jira/browse/MAPREDUCE-1182
> http://hadoop-common.472056.n3.nabble.com/Shuffle-In-Memory-OutOfMemoryError-td433197.html
> it seems as though Hadoop is supposed to prevent this from happening
> by intelligently managing the amount of memory that is provided to the
> shuffle. However, I don't know how ironclad this guarantee is.
> Can anyone advise me on how robust I can expect Hadoop to be to this
> issue, in the face of highly unbalanced reducer inputs? Thanks very
> much for your time.