I'm experiencing the following crash during reduce tasks:
on Hadoop 1.0.3 (specifically I'm using Amazon's EMR, AMI version
2.2.1). The crash is triggered by especially unbalanced reducer
inputs, i.e., when one reducer receives too many records. (The reduce
task gets retried three times, but since the data is the same every
time, it crashes each time in the same place and the job fails.)
>From the following links:
it seems as though Hadoop is supposed to prevent this from happening
by intelligently managing the amount of memory that is provided to the
shuffle. However, I don't know how ironclad this guarantee is.
Can anyone advise me on how robust I can expect Hadoop to be to this
issue, in the face of highly unbalanced reducer inputs? Thanks very
much for your time.