Could you post the stack trace from the job logs. Also looking at the task
tracker logs on the failed nodes may help.
On Friday, January 25, 2013, Terry Healy wrote:
> Running hadoop-0.20.2 on a 20 node cluster.
> When running a Map/Reduce job that uses several .jars loaded into the
> Distributed cache, several (~4) nodes have their map jobs fails because
> of ClassNotFoundException. All the other nodes proceed through the job
> normally and the jobs completes. But this is wasting 20-25% of my TT nodes.
> Can anyone explain why some nodes might fail to read all the .jars from
> the Distributed cache?