Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Hadoop JobTracker Hanging


Copy link to this message
-
Re: Hadoop JobTracker Hanging
Before the new hardware is ready, I suggest you configure jobtracker to
retain fewer jobs in memory - as Todd mentioned.

On Mon, Jun 21, 2010 at 12:49 PM, Bobby Dennett
<[EMAIL PROTECTED]>wrote:

> Thanks all for your suggestions (please note that Tan is my co-worker;
> we are both working to try and resolve this issue)... we experienced
> another hang this weekend and increased the HADOOP_HEAPSIZE setting to
> 6000 (MB) as we do periodically see "java.lang.OutOfMemoryError: Java
> heap space" errors in the jobtracker log. We are now looking into the
> resource allocation of the master node/server to ensure we aren't
> experiencing any issues due to the heap size increase. In parallel, we
> are also working on building "beefier" servers -- stronger CPUs, 3x more
> memory -- for the node running the primary namenode and jobtracker
> processes as well as for the secondary namenode.
>
> Any additional suggestions you might have for troubleshooting/resolving
> this hanging jobtracker issue would be greatly appreciated.
>
> Please note that I had previously started a similar topic on Get
> Satisfaction
> (
> http://www.getsatisfaction.com/cloudera/topics/looking_for_troubleshooting_tips_guidance_for_hanging_jobtracker
> )
> where Todd is helping and the output of jstack and jmap can be found.
>
> Thanks,
> -Bobby
>
> On Fri, 18 Jun 2010 15:04 -0600, "Li, Tan" <[EMAIL PROTECTED]> wrote:
> > Todd,
> > I will try to increase the HADOOP_HEAPSIZE to see if that helps.
> > Tan
> >
> > -----Original Message-----
> > From: Todd Lipcon [mailto:[EMAIL PROTECTED]]
> > Sent: Thursday, June 17, 2010 5:07 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Hadoop JobTracker Hanging
> >
> > Li, just to narrow your search, in my experience this is usually caused
> > by
> > OOME on the JT. Check the logs for OutOfMemoryException, see what you
> > find.
> > You may need to configure it to retain fewer jobs in memory, or up your
> > heap.
> >
> > -Todd
> >
> > On Thu, Jun 17, 2010 at 5:03 PM, Li, Tan <[EMAIL PROTECTED]> wrote:
> >
> > > Thanks for your tips, Ted.
> > > All of our QA is done on 0.20.1, and I got a feeling it is not version
> > > related.
> > > I will run jstack and jmap once the problem happens again and I may
> need
> > > your help to analyze the result.
> > >
> > > Tan
> > >
> > > -----Original Message-----
> > > From: Ted Yu [mailto:[EMAIL PROTECTED]]
> > > Sent: Thursday, June 17, 2010 2:39 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: Hadoop JobTracker Hanging
> > >
> > > Is upgrading to hadoop-0.20.2+228 possible ?
> > >
> > > Use jstack to get stack trace of job tracker process when this happens
> > > again.
> > > Use jmap to get shared object memory maps or heap memory details.
> > >
> > > On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <[EMAIL PROTECTED]> wrote:
> > >
> > > > Folks,
> > > >
> > > > I need some help on job tracker.
> > > > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is
> > > with
> > > > version 0.19.1 (apache) and the other one is with version 0.20.
> 1+169.68
> > > > (Cloudera).
> > > >
> > > > I have the same problem with both the clusters: the job tracker hangs
> > > > almost once a day.
> > > > Symptom: The job tracker web page can not be loaded, the command
> "hadoop
> > > > job -list" hangs and jobtracker.log file stops being updated.
> > > > No useful information can I find in the job tracker log file.
> > > > The symptom is gone after I restart the job tracker and the cluster
> runs
> > > > fine for another 20+ hour period. And then the symptom comes back.
> > > >
> > > > I do not have serious problem with HDFS.
> > > >
> > > > Any ideas about the causes? Any configuration parameter that I can
> change
> > > > to reduce the chances of the problem?
> > > > Any tips for diagnosing and troubleshooting?
> > > >
> > > > Thanks!
> > > >
> > > > Tan
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB