Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> OutOfMemoryError during reduce shuffle


Copy link to this message
-
OutOfMemoryError during reduce shuffle
I'm experiencing the following crash during reduce tasks:

https://gist.github.com/slingamn/04ff3ff3412af23aa50d

on Hadoop 1.0.3 (specifically I'm using Amazon's EMR, AMI version
2.2.1). The crash is triggered by especially unbalanced reducer
inputs, i.e., when one reducer receives too many records. (The reduce
task gets retried three times, but since the data is the same every
time, it crashes each time in the same place and the job fails.)

>From the following links:

https://issues.apache.org/jira/browse/MAPREDUCE-1182

http://hadoop-common.472056.n3.nabble.com/Shuffle-In-Memory-OutOfMemoryError-td433197.html

it seems as though Hadoop is supposed to prevent this from happening
by intelligently managing the amount of memory that is provided to the
shuffle. However, I don't know how ironclad this guarantee is.

Can anyone advise me on how robust I can expect Hadoop to be to this
issue, in the face of highly unbalanced reducer inputs? Thanks very
much for your time.
+
Hemanth Yamijala 2013-02-21, 01:41
+
Shivaram Lingamneni 2013-02-22, 06:56