Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - What to do/check/debug/root cause analysis when jobtracker hang


+
Patai Sangbutsarakum 2013-02-04, 23:21
Copy link to this message
-
RE: What to do/check/debug/root cause analysis when jobtracker hang
java8964 java8964 2013-02-07, 02:12

Our cluster on cdh3u4 has the same problem. I think it is caused by some bugs in JobTracker. I believe Cloudera knows about this issue.
After upgrading to cdh3u5, we havn't faced this issue yet, but I am not sure if it is confirmed to fix in the CDH3U5.
Yong

> Date: Mon, 4 Feb 2013 15:21:18 -0800
> Subject: What to do/check/debug/root cause analysis when jobtracker hang
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
>
> Lately, jobtracker in one of our production cluster fall into hang state.
> The load 5,10,15min is like 1 ish;
> with top command, jobtracker has 100% cpu all the time.
>
> So, i went ahead to try top -H -p jobtracker_pid, and always see a
> thread that have 100% cpu all the time.
>
> Unless we restart jobtracker, the hang state would never go away.
>
> I found OOM in jobtracker log file during the hang state.
>
> how could i know what is really going on on the one and only one
> thread that has 100% cpu.
>
> how could i prove that we run out of memory because amount of job
> _OR_
> there is memory leak in application side. ?
>
>
> I tried jstack to dump, and http://jobtracker:50030/stacks
>
> i just don't know what I should really look at output of those commands.
>
> The cluster is cdh3u4, on Centos6.2, with disable transparent_hugepage.
>
>
>
> hopefully this make sense,
> -P
     
+
Patai Sangbutsarakum 2013-02-07, 04:23