Patai Sangbutsarakum 2013-08-09, 20:41
When I've had problems with a slow jobtracker, i've found the issue to be
one of the following two (so far) possibilities:
- long GC pause (I'm guessing this is not it based on your email)
- hdfs is slow
I haven't dived into the code yet, but circumstantially I've found that
when you submit a job the jobtracker needs to put a bunch of files in hdfs,
such as the job.xml, the job jar, etc. I'm not sure how this scales with
larger and larger jobs, aside form the size of the splits serialization in
the job.xml, but if your HDFS is slow for any reason it can cause pauses in
your jobtracker. This affects other jobs being able to submit, as well as
the 50030 web ui.
I'd take a look at your namenode logs. When the jobtracker logs pause, do
you see a corresponding pause in the namenode logs? What gets spewed
before and after that pause?
On Fri, Aug 9, 2013 at 4:41 PM, Patai Sangbutsarakum <
[EMAIL PROTECTED]> wrote:
> A while back, i was fighting with the jobtracker page hangs when i browse
> to http://jobtracker:50030 browser doesn't show jobs info as usual which
> ends up because of allowing too much job history kept in jobtracker.
> Currently, i am setting up a new cluster 40g heap on the namenode and
> jobtracker in dedicated machines. Not fun part starts here; a developer
> tried to test out the cluster by launching a 76k map job (the cluster has
> around 6k-ish mappers)
> Job initialization was success, and finished the job.
> However, before the job is actually running, i can't access to the
> jobtracker page anymore same symptom as above.
> i see bunch of this in jobtracker log
> 2013-08-08 00:23:00,509 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_201307291733_0619_m_076796 has split on node: /rack/node
> Until i see this
> INFO org.apache.hadoop.mapred.JobInProgress: job_201307291733_0619
> 2013-08-08 00:23:00,509 INFO org.apache.hadoop.mapred.JobInProgress: Job
> job_201307291733_0619 initialized successfully with 76797 map tasks and 10
> reduce tasks.
> that's when i can access to the jobtracker page again.
> CPU on jobtracker is very little load, JTK's Heap is far from full like
> 1ish gig from 40
> network bandwidth is far from filled up.
> I'm running on 0.20.2 branch on CentOS6.4 with Java(TM) SE Runtime
> Environment (build 1.6.0_32-b05)
> What would be the root cause i should looking at or at least where to
> Thanks you in advanced
Patai Sangbutsarakum 2013-08-10, 01:57