Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> What to do/check/debug/root cause analysis when jobtracker hang


Copy link to this message
-
What to do/check/debug/root cause analysis when jobtracker hang
Lately, jobtracker in one of our production cluster fall into hang state.
The load 5,10,15min is like 1 ish;
with top command, jobtracker has 100% cpu all the time.

So, i went ahead to try top -H -p jobtracker_pid, and always see a
thread that have 100% cpu all the time.

Unless we restart jobtracker, the hang state would never go away.

I found OOM in jobtracker log file during the hang state.

how could i know what is really going on on the one and only one
thread that has 100% cpu.

how could i prove that we run out of memory because amount of job
_OR_
there is memory leak in application side. ?
I tried jstack to dump, and http://jobtracker:50030/stacks

i just don't know what I should really look at output of those commands.

The cluster is cdh3u4, on Centos6.2, with disable transparent_hugepage.

hopefully this make sense,
-P
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB