When Reducers start running during a certain job (mapred.reduce.slowstart.completed.maps = 0.8) it takes about 20 minutes before the DN stopd reacting. This seems to be due to a number of Exceptions in the TT - at least, it's the only place I'm seeing errors. The three recurring ones are getMapOutput, EOFException and IllegalStateException. It seems related to https://issues.apache.org/jira/browse/MAPREDUCE-5. See an excerpt from the logs attached.
We're running Hadoop 0.20.2 on a 6 node (test) cluster with:
# java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
Can anybody shed some light on this?
Thanks a bunch,