Does this happen all of the time?
I've seen cases where if I have a job or process that is hosing the CPU of
a tasktracker, it can cause the job tracker to pause while trying to
contact that tasktracker. Once the CPU load dips down to acceptable levels
communication can flow again and the job tracker resumes. That could be
happening to you? When it happens, requests to the web ui for that
tasktracker also don't work.
This usually happened for me when I was running too many concurrent tasks
on each tasktracker, though it actually came down to saturating the disk
and having a bunch of iowait causing processes to pile up. Lowering the
tasks per tasktracker caused less contention for disk which freed up the
iowait and the waiting processes. If your tasks are not disk heavy you can
put more tasks on the tasktracker, but we had a few disk-heavy jobs and
only when those ran did we see this.
On Thu, Nov 1, 2012 at 4:47 PM, Patai Sangbutsarakum <
[EMAIL PROTECTED]> wrote:
> I have a check monitoring the page jobtracker:50030/jobtracker.jsp,
> and the check shows timeout (180 sec) pretty often.
> Once I jump and browse to the page it actually take me from 5 sec to 5
> minutes to render.
> the heap size is 12G. from top command it is not used up.
> CPU load is 1 ish (in all 5, 10, 15 load)
> disk is far from full
> Swap is untouched
> Any input for how to troubleshoot this will be useful and greatly grateful