I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2
compute nodes). My input size is a sequence file of around 280mb.
Generally, my jobs run just fine and all finish in 2-5 minutes. However,
quite randomly the jobs refuse to run. They submit and appear when running
'hadoop job -list' but don't appear on the jobtracker's webpage. If I
manually type in the job ID on the webpage I can see it is trying to run
the setup task - the map tasks haven't even started. I've left them to run
and even after several minutes it is still in this state.
When I spot this, I kill the job and resubmit it and generally it works.
A couple of times I have seen similar problems with reduce tasks that get
stuck while 'initializing'.