Running hadoop-0.20.2 on a 20 node cluster.
When running a Map/Reduce job that uses several .jars loaded into the
Distributed cache, several (~4) nodes have their map jobs fails because
of ClassNotFoundException. All the other nodes proceed through the job
normally and the jobs completes. But this is wasting 20-25% of my TT nodes.
Can anyone explain why some nodes might fail to read all the .jars from
the Distributed cache?