A while back, i was fighting with the jobtracker page hangs when i browse
to http://jobtracker:50030 browser doesn't show jobs info as usual which
ends up because of allowing too much job history kept in jobtracker.
Currently, i am setting up a new cluster 40g heap on the namenode and
jobtracker in dedicated machines. Not fun part starts here; a developer
tried to test out the cluster by launching a 76k map job (the cluster has
around 6k-ish mappers)
Job initialization was success, and finished the job.
However, before the job is actually running, i can't access to the
jobtracker page anymore same symptom as above.
i see bunch of this in jobtracker log
2013-08-08 00:23:00,509 INFO org.apache.hadoop.mapred.JobInProgress:
tip:task_201307291733_0619_m_076796 has split on node: /rack/node
Until i see this
INFO org.apache.hadoop.mapred.JobInProgress: job_201307291733_0619
2013-08-08 00:23:00,509 INFO org.apache.hadoop.mapred.JobInProgress: Job
job_201307291733_0619 initialized successfully with 76797 map tasks and 10
that's when i can access to the jobtracker page again.
CPU on jobtracker is very little load, JTK's Heap is far from full like
1ish gig from 40
network bandwidth is far from filled up.
I'm running on 0.20.2 branch on CentOS6.4 with Java(TM) SE Runtime
Environment (build 1.6.0_32-b05)
What would be the root cause i should looking at or at least where to start?
Thanks you in advanced