I have an issue that I am running a hadoop job on a 40 node cluster with
about 300 Map tasks and about 300 reduce tasks. Most tasks complete within
20 minutes but a few, typically less than 10 run for many hours.
If they complete I see nothing to suggest that the number of bytes read or
written or the number of records read or written is significantly different
from tasks that run much faster. I sometimes see multiple attempts -
usually only two and the cluster is doing nothing else.
Any suggested tuning?