|
Li, Tan
2010-06-17, 21:00
Ted Yu
2010-06-17, 21:38
Todd Lipcon
2010-06-17, 21:40
Li, Tan
2010-06-17, 23:57
Li, Tan
2010-06-18, 00:03
Todd Lipcon
2010-06-18, 00:07
James Seigel
2010-06-18, 01:21
Li, Tan
2010-06-18, 17:40
Li, Tan
2010-06-18, 21:04
Bobby Dennett
2010-06-21, 19:49
James Seigel
2010-06-21, 19:51
Ted Yu
2010-06-21, 20:16
Steve Loughran
2010-06-22, 10:17
James Seigel
2010-06-22, 14:28
Allen Wittenauer
2010-06-22, 15:53
Rahul Jain
2010-06-22, 17:12
Hemanth Yamijala
2010-06-22, 17:20
Bobby Dennett
2010-06-23, 07:10
|
-
Hadoop JobTracker HangingLi, Tan 2010-06-17, 21:00
Folks,
I need some help on job tracker. I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is with version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 (Cloudera). I have the same problem with both the clusters: the job tracker hangs almost once a day. Symptom: The job tracker web page can not be loaded, the command "hadoop job -list" hangs and jobtracker.log file stops being updated. No useful information can I find in the job tracker log file. The symptom is gone after I restart the job tracker and the cluster runs fine for another 20+ hour period. And then the symptom comes back. I do not have serious problem with HDFS. Any ideas about the causes? Any configuration parameter that I can change to reduce the chances of the problem? Any tips for diagnosing and troubleshooting? Thanks! Tan
-
Re: Hadoop JobTracker HangingTed Yu 2010-06-17, 21:38
Is upgrading to hadoop-0.20.2+228 possible ?
Use jstack to get stack trace of job tracker process when this happens again. Use jmap to get shared object memory maps or heap memory details. On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > Folks, > > I need some help on job tracker. > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is with > version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 > (Cloudera). > > I have the same problem with both the clusters: the job tracker hangs > almost once a day. > Symptom: The job tracker web page can not be loaded, the command "hadoop > job -list" hangs and jobtracker.log file stops being updated. > No useful information can I find in the job tracker log file. > The symptom is gone after I restart the job tracker and the cluster runs > fine for another 20+ hour period. And then the symptom comes back. > > I do not have serious problem with HDFS. > > Any ideas about the causes? Any configuration parameter that I can change > to reduce the chances of the problem? > Any tips for diagnosing and troubleshooting? > > Thanks! > > Tan > > > >
-
Re: Hadoop JobTracker HangingTodd Lipcon 2010-06-17, 21:40
+1, jstack is crucial to solve these kinds of issues. Also, which scheduler
are you using? Thanks -Todd On Thu, Jun 17, 2010 at 2:38 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Is upgrading to hadoop-0.20.2+228 possible ? > > Use jstack to get stack trace of job tracker process when this happens > again. > Use jmap to get shared object memory maps or heap memory details. > > On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > > > Folks, > > > > I need some help on job tracker. > > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is > with > > version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 > > (Cloudera). > > > > I have the same problem with both the clusters: the job tracker hangs > > almost once a day. > > Symptom: The job tracker web page can not be loaded, the command "hadoop > > job -list" hangs and jobtracker.log file stops being updated. > > No useful information can I find in the job tracker log file. > > The symptom is gone after I restart the job tracker and the cluster runs > > fine for another 20+ hour period. And then the symptom comes back. > > > > I do not have serious problem with HDFS. > > > > Any ideas about the causes? Any configuration parameter that I can change > > to reduce the chances of the problem? > > Any tips for diagnosing and troubleshooting? > > > > Thanks! > > > > Tan > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera
-
RE: Hadoop JobTracker HangingLi, Tan 2010-06-17, 23:57
Thanks, Todd.
I will try that and let you know the result. Tan -----Original Message----- From: Todd Lipcon [mailto:[EMAIL PROTECTED]] Sent: Thursday, June 17, 2010 2:41 PM To: [EMAIL PROTECTED] Subject: Re: Hadoop JobTracker Hanging +1, jstack is crucial to solve these kinds of issues. Also, which scheduler are you using? Thanks -Todd On Thu, Jun 17, 2010 at 2:38 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Is upgrading to hadoop-0.20.2+228 possible ? > > Use jstack to get stack trace of job tracker process when this happens > again. > Use jmap to get shared object memory maps or heap memory details. > > On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > > > Folks, > > > > I need some help on job tracker. > > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is > with > > version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 > > (Cloudera). > > > > I have the same problem with both the clusters: the job tracker hangs > > almost once a day. > > Symptom: The job tracker web page can not be loaded, the command "hadoop > > job -list" hangs and jobtracker.log file stops being updated. > > No useful information can I find in the job tracker log file. > > The symptom is gone after I restart the job tracker and the cluster runs > > fine for another 20+ hour period. And then the symptom comes back. > > > > I do not have serious problem with HDFS. > > > > Any ideas about the causes? Any configuration parameter that I can change > > to reduce the chances of the problem? > > Any tips for diagnosing and troubleshooting? > > > > Thanks! > > > > Tan > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera
-
RE: Hadoop JobTracker HangingLi, Tan 2010-06-18, 00:03
Thanks for your tips, Ted.
All of our QA is done on 0.20.1, and I got a feeling it is not version related. I will run jstack and jmap once the problem happens again and I may need your help to analyze the result. Tan -----Original Message----- From: Ted Yu [mailto:[EMAIL PROTECTED]] Sent: Thursday, June 17, 2010 2:39 PM To: [EMAIL PROTECTED] Subject: Re: Hadoop JobTracker Hanging Is upgrading to hadoop-0.20.2+228 possible ? Use jstack to get stack trace of job tracker process when this happens again. Use jmap to get shared object memory maps or heap memory details. On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > Folks, > > I need some help on job tracker. > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is with > version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 > (Cloudera). > > I have the same problem with both the clusters: the job tracker hangs > almost once a day. > Symptom: The job tracker web page can not be loaded, the command "hadoop > job -list" hangs and jobtracker.log file stops being updated. > No useful information can I find in the job tracker log file. > The symptom is gone after I restart the job tracker and the cluster runs > fine for another 20+ hour period. And then the symptom comes back. > > I do not have serious problem with HDFS. > > Any ideas about the causes? Any configuration parameter that I can change > to reduce the chances of the problem? > Any tips for diagnosing and troubleshooting? > > Thanks! > > Tan > > > >
-
Re: Hadoop JobTracker HangingTodd Lipcon 2010-06-18, 00:07
Li, just to narrow your search, in my experience this is usually caused by
OOME on the JT. Check the logs for OutOfMemoryException, see what you find. You may need to configure it to retain fewer jobs in memory, or up your heap. -Todd On Thu, Jun 17, 2010 at 5:03 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > Thanks for your tips, Ted. > All of our QA is done on 0.20.1, and I got a feeling it is not version > related. > I will run jstack and jmap once the problem happens again and I may need > your help to analyze the result. > > Tan > > -----Original Message----- > From: Ted Yu [mailto:[EMAIL PROTECTED]] > Sent: Thursday, June 17, 2010 2:39 PM > To: [EMAIL PROTECTED] > Subject: Re: Hadoop JobTracker Hanging > > Is upgrading to hadoop-0.20.2+228 possible ? > > Use jstack to get stack trace of job tracker process when this happens > again. > Use jmap to get shared object memory maps or heap memory details. > > On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > > > Folks, > > > > I need some help on job tracker. > > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is > with > > version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 > > (Cloudera). > > > > I have the same problem with both the clusters: the job tracker hangs > > almost once a day. > > Symptom: The job tracker web page can not be loaded, the command "hadoop > > job -list" hangs and jobtracker.log file stops being updated. > > No useful information can I find in the job tracker log file. > > The symptom is gone after I restart the job tracker and the cluster runs > > fine for another 20+ hour period. And then the symptom comes back. > > > > I do not have serious problem with HDFS. > > > > Any ideas about the causes? Any configuration parameter that I can change > > to reduce the chances of the problem? > > Any tips for diagnosing and troubleshooting? > > > > Thanks! > > > > Tan > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera
-
Re: Hadoop JobTracker HangingJames Seigel 2010-06-18, 01:21
Up the memory from the default to about 4x the default (heap setting). This should make it better I’d think!
We’d been having the same issue...I believe this fixed it. James On 2010-06-17, at 3:00 PM, Li, Tan wrote: > Folks, > > I need some help on job tracker. > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is with version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 (Cloudera). > > I have the same problem with both the clusters: the job tracker hangs almost once a day. > Symptom: The job tracker web page can not be loaded, the command "hadoop job -list" hangs and jobtracker.log file stops being updated. > No useful information can I find in the job tracker log file. > The symptom is gone after I restart the job tracker and the cluster runs fine for another 20+ hour period. And then the symptom comes back. > > I do not have serious problem with HDFS. > > Any ideas about the causes? Any configuration parameter that I can change to reduce the chances of the problem? > Any tips for diagnosing and troubleshooting? > > Thanks! > > Tan > > >
-
RE: Hadoop JobTracker HangingLi, Tan 2010-06-18, 17:40
Thanks for your suggestions, James.
I will try that. Tan -----Original Message----- From: James Seigel [mailto:[EMAIL PROTECTED]] Sent: Thursday, June 17, 2010 6:21 PM To: [EMAIL PROTECTED] Subject: Re: Hadoop JobTracker Hanging Up the memory from the default to about 4x the default (heap setting). This should make it better I'd think! We'd been having the same issue...I believe this fixed it. James On 2010-06-17, at 3:00 PM, Li, Tan wrote: > Folks, > > I need some help on job tracker. > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is with version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 (Cloudera). > > I have the same problem with both the clusters: the job tracker hangs almost once a day. > Symptom: The job tracker web page can not be loaded, the command "hadoop job -list" hangs and jobtracker.log file stops being updated. > No useful information can I find in the job tracker log file. > The symptom is gone after I restart the job tracker and the cluster runs fine for another 20+ hour period. And then the symptom comes back. > > I do not have serious problem with HDFS. > > Any ideas about the causes? Any configuration parameter that I can change to reduce the chances of the problem? > Any tips for diagnosing and troubleshooting? > > Thanks! > > Tan > > >
-
RE: Hadoop JobTracker HangingLi, Tan 2010-06-18, 21:04
Todd,
I will try to increase the HADOOP_HEAPSIZE to see if that helps. Tan -----Original Message----- From: Todd Lipcon [mailto:[EMAIL PROTECTED]] Sent: Thursday, June 17, 2010 5:07 PM To: [EMAIL PROTECTED] Subject: Re: Hadoop JobTracker Hanging Li, just to narrow your search, in my experience this is usually caused by OOME on the JT. Check the logs for OutOfMemoryException, see what you find. You may need to configure it to retain fewer jobs in memory, or up your heap. -Todd On Thu, Jun 17, 2010 at 5:03 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > Thanks for your tips, Ted. > All of our QA is done on 0.20.1, and I got a feeling it is not version > related. > I will run jstack and jmap once the problem happens again and I may need > your help to analyze the result. > > Tan > > -----Original Message----- > From: Ted Yu [mailto:[EMAIL PROTECTED]] > Sent: Thursday, June 17, 2010 2:39 PM > To: [EMAIL PROTECTED] > Subject: Re: Hadoop JobTracker Hanging > > Is upgrading to hadoop-0.20.2+228 possible ? > > Use jstack to get stack trace of job tracker process when this happens > again. > Use jmap to get shared object memory maps or heap memory details. > > On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > > > Folks, > > > > I need some help on job tracker. > > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is > with > > version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 > > (Cloudera). > > > > I have the same problem with both the clusters: the job tracker hangs > > almost once a day. > > Symptom: The job tracker web page can not be loaded, the command "hadoop > > job -list" hangs and jobtracker.log file stops being updated. > > No useful information can I find in the job tracker log file. > > The symptom is gone after I restart the job tracker and the cluster runs > > fine for another 20+ hour period. And then the symptom comes back. > > > > I do not have serious problem with HDFS. > > > > Any ideas about the causes? Any configuration parameter that I can change > > to reduce the chances of the problem? > > Any tips for diagnosing and troubleshooting? > > > > Thanks! > > > > Tan > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera
-
RE: Hadoop JobTracker HangingBobby Dennett 2010-06-21, 19:49
Thanks all for your suggestions (please note that Tan is my co-worker;
we are both working to try and resolve this issue)... we experienced another hang this weekend and increased the HADOOP_HEAPSIZE setting to 6000 (MB) as we do periodically see "java.lang.OutOfMemoryError: Java heap space" errors in the jobtracker log. We are now looking into the resource allocation of the master node/server to ensure we aren't experiencing any issues due to the heap size increase. In parallel, we are also working on building "beefier" servers -- stronger CPUs, 3x more memory -- for the node running the primary namenode and jobtracker processes as well as for the secondary namenode. Any additional suggestions you might have for troubleshooting/resolving this hanging jobtracker issue would be greatly appreciated. Please note that I had previously started a similar topic on Get Satisfaction (http://www.getsatisfaction.com/cloudera/topics/looking_for_troubleshooting_tips_guidance_for_hanging_jobtracker) where Todd is helping and the output of jstack and jmap can be found. Thanks, -Bobby On Fri, 18 Jun 2010 15:04 -0600, "Li, Tan" <[EMAIL PROTECTED]> wrote: > Todd, > I will try to increase the HADOOP_HEAPSIZE to see if that helps. > Tan > > -----Original Message----- > From: Todd Lipcon [mailto:[EMAIL PROTECTED]] > Sent: Thursday, June 17, 2010 5:07 PM > To: [EMAIL PROTECTED] > Subject: Re: Hadoop JobTracker Hanging > > Li, just to narrow your search, in my experience this is usually caused > by > OOME on the JT. Check the logs for OutOfMemoryException, see what you > find. > You may need to configure it to retain fewer jobs in memory, or up your > heap. > > -Todd > > On Thu, Jun 17, 2010 at 5:03 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > > > Thanks for your tips, Ted. > > All of our QA is done on 0.20.1, and I got a feeling it is not version > > related. > > I will run jstack and jmap once the problem happens again and I may need > > your help to analyze the result. > > > > Tan > > > > -----Original Message----- > > From: Ted Yu [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, June 17, 2010 2:39 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Hadoop JobTracker Hanging > > > > Is upgrading to hadoop-0.20.2+228 possible ? > > > > Use jstack to get stack trace of job tracker process when this happens > > again. > > Use jmap to get shared object memory maps or heap memory details. > > > > On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > > > > > Folks, > > > > > > I need some help on job tracker. > > > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is > > with > > > version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 > > > (Cloudera). > > > > > > I have the same problem with both the clusters: the job tracker hangs > > > almost once a day. > > > Symptom: The job tracker web page can not be loaded, the command "hadoop > > > job -list" hangs and jobtracker.log file stops being updated. > > > No useful information can I find in the job tracker log file. > > > The symptom is gone after I restart the job tracker and the cluster runs > > > fine for another 20+ hour period. And then the symptom comes back. > > > > > > I do not have serious problem with HDFS. > > > > > > Any ideas about the causes? Any configuration parameter that I can change > > > to reduce the chances of the problem? > > > Any tips for diagnosing and troubleshooting? > > > > > > Thanks! > > > > > > Tan > > > > > > > > > > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
-
Re: Hadoop JobTracker HangingJames Seigel 2010-06-21, 19:51
Good luck Bobby. I hope that when you get this problem licked you’ll post your solutions to help us all learn some more stuff as well :)
Cheers James. On 2010-06-21, at 1:49 PM, Bobby Dennett wrote: > Thanks all for your suggestions (please note that Tan is my co-worker; > we are both working to try and resolve this issue)... we experienced > another hang this weekend and increased the HADOOP_HEAPSIZE setting to > 6000 (MB) as we do periodically see "java.lang.OutOfMemoryError: Java > heap space" errors in the jobtracker log. We are now looking into the > resource allocation of the master node/server to ensure we aren't > experiencing any issues due to the heap size increase. In parallel, we > are also working on building "beefier" servers -- stronger CPUs, 3x more > memory -- for the node running the primary namenode and jobtracker > processes as well as for the secondary namenode. > > Any additional suggestions you might have for troubleshooting/resolving > this hanging jobtracker issue would be greatly appreciated. > > Please note that I had previously started a similar topic on Get > Satisfaction > (http://www.getsatisfaction.com/cloudera/topics/looking_for_troubleshooting_tips_guidance_for_hanging_jobtracker) > where Todd is helping and the output of jstack and jmap can be found. > > Thanks, > -Bobby > > On Fri, 18 Jun 2010 15:04 -0600, "Li, Tan" <[EMAIL PROTECTED]> wrote: >> Todd, >> I will try to increase the HADOOP_HEAPSIZE to see if that helps. >> Tan >> >> -----Original Message----- >> From: Todd Lipcon [mailto:[EMAIL PROTECTED]] >> Sent: Thursday, June 17, 2010 5:07 PM >> To: [EMAIL PROTECTED] >> Subject: Re: Hadoop JobTracker Hanging >> >> Li, just to narrow your search, in my experience this is usually caused >> by >> OOME on the JT. Check the logs for OutOfMemoryException, see what you >> find. >> You may need to configure it to retain fewer jobs in memory, or up your >> heap. >> >> -Todd >> >> On Thu, Jun 17, 2010 at 5:03 PM, Li, Tan <[EMAIL PROTECTED]> wrote: >> >>> Thanks for your tips, Ted. >>> All of our QA is done on 0.20.1, and I got a feeling it is not version >>> related. >>> I will run jstack and jmap once the problem happens again and I may need >>> your help to analyze the result. >>> >>> Tan >>> >>> -----Original Message----- >>> From: Ted Yu [mailto:[EMAIL PROTECTED]] >>> Sent: Thursday, June 17, 2010 2:39 PM >>> To: [EMAIL PROTECTED] >>> Subject: Re: Hadoop JobTracker Hanging >>> >>> Is upgrading to hadoop-0.20.2+228 possible ? >>> >>> Use jstack to get stack trace of job tracker process when this happens >>> again. >>> Use jmap to get shared object memory maps or heap memory details. >>> >>> On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <[EMAIL PROTECTED]> wrote: >>> >>>> Folks, >>>> >>>> I need some help on job tracker. >>>> I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is >>> with >>>> version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 >>>> (Cloudera). >>>> >>>> I have the same problem with both the clusters: the job tracker hangs >>>> almost once a day. >>>> Symptom: The job tracker web page can not be loaded, the command "hadoop >>>> job -list" hangs and jobtracker.log file stops being updated. >>>> No useful information can I find in the job tracker log file. >>>> The symptom is gone after I restart the job tracker and the cluster runs >>>> fine for another 20+ hour period. And then the symptom comes back. >>>> >>>> I do not have serious problem with HDFS. >>>> >>>> Any ideas about the causes? Any configuration parameter that I can change >>>> to reduce the chances of the problem? >>>> Any tips for diagnosing and troubleshooting? >>>> >>>> Thanks! >>>> >>>> Tan >>>> >>>> >>>> >>>> >>> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >>
-
Re: Hadoop JobTracker HangingTed Yu 2010-06-21, 20:16
Before the new hardware is ready, I suggest you configure jobtracker to
retain fewer jobs in memory - as Todd mentioned. On Mon, Jun 21, 2010 at 12:49 PM, Bobby Dennett <[EMAIL PROTECTED]>wrote: > Thanks all for your suggestions (please note that Tan is my co-worker; > we are both working to try and resolve this issue)... we experienced > another hang this weekend and increased the HADOOP_HEAPSIZE setting to > 6000 (MB) as we do periodically see "java.lang.OutOfMemoryError: Java > heap space" errors in the jobtracker log. We are now looking into the > resource allocation of the master node/server to ensure we aren't > experiencing any issues due to the heap size increase. In parallel, we > are also working on building "beefier" servers -- stronger CPUs, 3x more > memory -- for the node running the primary namenode and jobtracker > processes as well as for the secondary namenode. > > Any additional suggestions you might have for troubleshooting/resolving > this hanging jobtracker issue would be greatly appreciated. > > Please note that I had previously started a similar topic on Get > Satisfaction > ( > http://www.getsatisfaction.com/cloudera/topics/looking_for_troubleshooting_tips_guidance_for_hanging_jobtracker > ) > where Todd is helping and the output of jstack and jmap can be found. > > Thanks, > -Bobby > > On Fri, 18 Jun 2010 15:04 -0600, "Li, Tan" <[EMAIL PROTECTED]> wrote: > > Todd, > > I will try to increase the HADOOP_HEAPSIZE to see if that helps. > > Tan > > > > -----Original Message----- > > From: Todd Lipcon [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, June 17, 2010 5:07 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Hadoop JobTracker Hanging > > > > Li, just to narrow your search, in my experience this is usually caused > > by > > OOME on the JT. Check the logs for OutOfMemoryException, see what you > > find. > > You may need to configure it to retain fewer jobs in memory, or up your > > heap. > > > > -Todd > > > > On Thu, Jun 17, 2010 at 5:03 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > > > > > Thanks for your tips, Ted. > > > All of our QA is done on 0.20.1, and I got a feeling it is not version > > > related. > > > I will run jstack and jmap once the problem happens again and I may > need > > > your help to analyze the result. > > > > > > Tan > > > > > > -----Original Message----- > > > From: Ted Yu [mailto:[EMAIL PROTECTED]] > > > Sent: Thursday, June 17, 2010 2:39 PM > > > To: [EMAIL PROTECTED] > > > Subject: Re: Hadoop JobTracker Hanging > > > > > > Is upgrading to hadoop-0.20.2+228 possible ? > > > > > > Use jstack to get stack trace of job tracker process when this happens > > > again. > > > Use jmap to get shared object memory maps or heap memory details. > > > > > > On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <[EMAIL PROTECTED]> wrote: > > > > > > > Folks, > > > > > > > > I need some help on job tracker. > > > > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is > > > with > > > > version 0.19.1 (apache) and the other one is with version 0.20. > 1+169.68 > > > > (Cloudera). > > > > > > > > I have the same problem with both the clusters: the job tracker hangs > > > > almost once a day. > > > > Symptom: The job tracker web page can not be loaded, the command > "hadoop > > > > job -list" hangs and jobtracker.log file stops being updated. > > > > No useful information can I find in the job tracker log file. > > > > The symptom is gone after I restart the job tracker and the cluster > runs > > > > fine for another 20+ hour period. And then the symptom comes back. > > > > > > > > I do not have serious problem with HDFS. > > > > > > > > Any ideas about the causes? Any configuration parameter that I can > change > > > > to reduce the chances of the problem? > > > > Any tips for diagnosing and troubleshooting? > > > > > > > > Thanks! > > > > > > > > Tan > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > >
-
Re: Hadoop JobTracker HangingSteve Loughran 2010-06-22, 10:17
Bobby Dennett wrote:
> Thanks all for your suggestions (please note that Tan is my co-worker; > we are both working to try and resolve this issue)... we experienced > another hang this weekend and increased the HADOOP_HEAPSIZE setting to > 6000 (MB) as we do periodically see "java.lang.OutOfMemoryError: Java > heap space" errors in the jobtracker log. We are now looking into the > resource allocation of the master node/server to ensure we aren't > experiencing any issues due to the heap size increase. In parallel, we > are also working on building "beefier" servers -- stronger CPUs, 3x more > memory -- for the node running the primary namenode and jobtracker > processes as well as for the secondary namenode. > > Any additional suggestions you might have for troubleshooting/resolving > this hanging jobtracker issue would be greatly appreciated. Have you tried * using compressed object pointers on java 6 server? They reduce space * bolder: JRockit JVM. Not officially supported in Hadoop, but I liked using right up until oracle stopped giving away the updates with security patches. It has a way better heap as well as compressed pointers for a long time (==more stable code) I'm surprised its the JT that is OOM-ing, anecdotally its the NN and 2ary NN that use more, especially if the files are many and the blocksize small. the JT should not be tracking that much data over time
-
Re: Hadoop JobTracker HangingJames Seigel 2010-06-22, 14:28
+1 for compressed pointers.
Sent from my mobile. Please excuse the typos. On 2010-06-22, at 4:18 AM, Steve Loughran <[EMAIL PROTECTED]> wrote: > Bobby Dennett wrote: >> Thanks all for your suggestions (please note that Tan is my co-worker; >> we are both working to try and resolve this issue)... we experienced >> another hang this weekend and increased the HADOOP_HEAPSIZE setting to >> 6000 (MB) as we do periodically see "java.lang.OutOfMemoryError: Java >> heap space" errors in the jobtracker log. We are now looking into the >> resource allocation of the master node/server to ensure we aren't >> experiencing any issues due to the heap size increase. In parallel, we >> are also working on building "beefier" servers -- stronger CPUs, 3x more >> memory -- for the node running the primary namenode and jobtracker >> processes as well as for the secondary namenode. >> >> Any additional suggestions you might have for troubleshooting/resolving >> this hanging jobtracker issue would be greatly appreciated. > > Have you tried > * using compressed object pointers on java 6 server? They reduce space > > * bolder: JRockit JVM. Not officially supported in Hadoop, but I liked > using right up until oracle stopped giving away the updates with > security patches. It has a way better heap as well as compressed > pointers for a long time (==more stable code) > > I'm surprised its the JT that is OOM-ing, anecdotally its the NN and > 2ary NN that use more, especially if the files are many and the > blocksize small. the JT should not be tracking that much data over time
-
Re: Hadoop JobTracker HangingAllen Wittenauer 2010-06-22, 15:53
On Jun 22, 2010, at 3:17 AM, Steve Loughran wrote: > > I'm surprised its the JT that is OOM-ing, anecdotally its the NN and 2ary NN that use more, especially if the files are many and the blocksize small. the JT should not be tracking that much data over time Pre-0.20.2, there are definitely bugs with how the JT history is handled, causing some memory leakage. The other fairly common condition is if you have way too many tasks per job. This is usually an indication that your data layout is way out of whack (too little data in too many files) or that you should be using CombinedFileInputFormat.
-
Re: Hadoop JobTracker HangingRahul Jain 2010-06-22, 17:12
There are two issues which were fixed in 0.21.0 and can cause job tracker
to run out of memory: https://issues.apache.org/jira/browse/MAPREDUCE-1316 and https://issues.apache.org/jira/browse/MAPREDUCE-841 We've been hit by MAPREDUCE-841 (large jobConf objects with large number of tasks, especially when running pig jobs) a number of times in hadoop 0.20.1, 0.20.2+. The current workarounds are: a) Be careful about what you store in jobConf object b) Understand and control the largest number of mappers/reducers that can be queued at any time for processing. c) Provide lot of RAM to jobTracker We use (c) to save on debugging man hours most of the time :). -Rahul On Tue, Jun 22, 2010 at 8:53 AM, Allen Wittenauer <[EMAIL PROTECTED]>wrote: > > On Jun 22, 2010, at 3:17 AM, Steve Loughran wrote: > > > > I'm surprised its the JT that is OOM-ing, anecdotally its the NN and 2ary > NN that use more, especially if the files are many and the blocksize small. > the JT should not be tracking that much data over time > > Pre-0.20.2, there are definitely bugs with how the JT history is handled, > causing some memory leakage. > > The other fairly common condition is if you have way too many tasks per > job. This is usually an indication that your data layout is way out of > whack (too little data in too many files) or that you should be using > CombinedFileInputFormat.
-
Re: Hadoop JobTracker HangingHemanth Yamijala 2010-06-22, 17:20
There was also https://issues.apache.org/jira/browse/MAPREDUCE-1316
whose cause hit clusters at Yahoo! very badly last year. The situation was particularly noticeable in the face of lots of jobs with failed tasks and a specific fix that enabled OutOfBand heartbeats. The latter (i.e. the OOB heartbeats patch) is not in 0.20 AFAIK, but still the failed tasks could be causing it. Thanks Hemanth On Tue, Jun 22, 2010 at 3:47 PM, Steve Loughran <[EMAIL PROTECTED]> wrote: > Bobby Dennett wrote: >> >> Thanks all for your suggestions (please note that Tan is my co-worker; >> we are both working to try and resolve this issue)... we experienced >> another hang this weekend and increased the HADOOP_HEAPSIZE setting to >> 6000 (MB) as we do periodically see "java.lang.OutOfMemoryError: Java >> heap space" errors in the jobtracker log. We are now looking into the >> resource allocation of the master node/server to ensure we aren't >> experiencing any issues due to the heap size increase. In parallel, we >> are also working on building "beefier" servers -- stronger CPUs, 3x more >> memory -- for the node running the primary namenode and jobtracker >> processes as well as for the secondary namenode. >> >> Any additional suggestions you might have for troubleshooting/resolving >> this hanging jobtracker issue would be greatly appreciated. > > Have you tried > * using compressed object pointers on java 6 server? They reduce space > > * bolder: JRockit JVM. Not officially supported in Hadoop, but I liked > using right up until oracle stopped giving away the updates with security > patches. It has a way better heap as well as compressed pointers for a long > time (==more stable code) > > I'm surprised its the JT that is OOM-ing, anecdotally its the NN and 2ary NN > that use more, especially if the files are many and the blocksize small. the > JT should not be tracking that much data over time >
-
Re: Hadoop JobTracker HangingBobby Dennett 2010-06-23, 07:10
Thanks for the latest round of suggestions. We will definitely check
out compressed object pointers and are looking into what we can do regarding the JT history. As I mentioned previously, we are working on getting stronger servers for the NN/JT node and the secondary NN node (similar to workaround (c) below). Engineering is also working on "improving" one of our processes that accesses a large number of potentially smaller files to try and reduce our maximum number of map tasks (similar to workaround (b) below). On a side note, our JT process has been running since Saturday morning after increasing the heap size to 6,000 MB... so far, so good. Hopefully, I didn't just jinx it ;o) -Bobby On 6/22/10 10:12 AM, Rahul Jain wrote: > There are two issues which were fixed in 0.21.0 and can cause job tracker > to run out of memory: > > https://issues.apache.org/jira/browse/MAPREDUCE-1316 > > and > > https://issues.apache.org/jira/browse/MAPREDUCE-841 > > We've been hit by MAPREDUCE-841 (large jobConf objects with large number of > tasks, especially when running pig jobs) a number of times in hadoop 0.20.1, > 0.20.2+. > > The current workarounds are: > > a) Be careful about what you store in jobConf object > b) Understand and control the largest number of mappers/reducers that can > be queued at any time for processing. > c) Provide lot of RAM to jobTracker > > We use (c) to save on debugging man hours most of the time :). > > -Rahul > > On Tue, Jun 22, 2010 at 8:53 AM, Allen Wittenauer > <[EMAIL PROTECTED]>wrote: > >> On Jun 22, 2010, at 3:17 AM, Steve Loughran wrote: >>> I'm surprised its the JT that is OOM-ing, anecdotally its the NN and 2ary >> NN that use more, especially if the files are many and the blocksize small. >> the JT should not be tracking that much data over time >> >> Pre-0.20.2, there are definitely bugs with how the JT history is handled, >> causing some memory leakage. >> >> The other fairly common condition is if you have way too many tasks per >> job. This is usually an indication that your data layout is way out of >> whack (too little data in too many files) or that you should be using >> CombinedFileInputFormat. |