Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> What to do/check/debug/root cause analysis when jobtracker hang


+
Patai Sangbutsarakum 2013-02-04, 23:21
+
java8964 java8964 2013-02-07, 02:12
Copy link to this message
-
Re: What to do/check/debug/root cause analysis when jobtracker hang
I wish it is the case. i have another prod. cluster using cdh3u4 too,
but it won't happen.

On Wed, Feb 6, 2013 at 6:12 PM, java8964 java8964 <[EMAIL PROTECTED]> wrote:
> Our cluster on cdh3u4 has the same problem. I think it is caused by some
> bugs in JobTracker. I believe Cloudera knows about this issue.
>
> After upgrading to cdh3u5, we havn't faced this issue yet, but I am not sure
> if it is confirmed to fix in the CDH3U5.
>
> Yong
>
>> Date: Mon, 4 Feb 2013 15:21:18 -0800
>> Subject: What to do/check/debug/root cause analysis when jobtracker hang
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>
>>
>> Lately, jobtracker in one of our production cluster fall into hang state.
>> The load 5,10,15min is like 1 ish;
>> with top command, jobtracker has 100% cpu all the time.
>>
>> So, i went ahead to try top -H -p jobtracker_pid, and always see a
>> thread that have 100% cpu all the time.
>>
>> Unless we restart jobtracker, the hang state would never go away.
>>
>> I found OOM in jobtracker log file during the hang state.
>>
>> how could i know what is really going on on the one and only one
>> thread that has 100% cpu.
>>
>> how could i prove that we run out of memory because amount of job
>> _OR_
>> there is memory leak in application side. ?
>>
>>
>> I tried jstack to dump, and http://jobtracker:50030/stacks
>>
>> i just don't know what I should really look at output of those commands.
>>
>> The cluster is cdh3u4, on Centos6.2, with disable transparent_hugepage.
>>
>>
>>
>> hopefully this make sense,
>> -P
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB