Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - What to do/check/debug/root cause analysis when jobtracker hang


Copy link to this message
-
What to do/check/debug/root cause analysis when jobtracker hang
Patai Sangbutsarakum 2013-02-04, 23:21
Lately, jobtracker in one of our production cluster fall into hang state.
The load 5,10,15min is like 1 ish;
with top command, jobtracker has 100% cpu all the time.

So, i went ahead to try top -H -p jobtracker_pid, and always see a
thread that have 100% cpu all the time.

Unless we restart jobtracker, the hang state would never go away.

I found OOM in jobtracker log file during the hang state.

how could i know what is really going on on the one and only one
thread that has 100% cpu.

how could i prove that we run out of memory because amount of job
_OR_
there is memory leak in application side. ?
I tried jstack to dump, and http://jobtracker:50030/stacks

i just don't know what I should really look at output of those commands.

The cluster is cdh3u4, on Centos6.2, with disable transparent_hugepage.

hopefully this make sense,
-P