|
Marc Limotte
2009-09-23, 18:06
Edward Capriolo
2009-09-24, 14:50
Marc Limotte
2009-09-24, 16:57
Todd Lipcon
2009-09-24, 17:18
Marc Limotte
2009-09-24, 18:24
Todd Lipcon
2009-09-24, 18:27
Koji Noguchi
2009-09-24, 18:37
Marc Limotte
2009-09-24, 21:11
Marc Limotte
2009-09-24, 21:19
Marc Limotte
2009-09-24, 22:54
Vinod KV
2009-09-25, 03:22
Feng, Ao
2009-10-09, 17:47
Frank Singleton
2009-10-09, 18:27
Marc Limotte
2009-10-27, 21:48
|
-
Task process exit with nonzero status of 1Marc Limotte 2009-09-23, 18:06
I'm seeing this error when I try to run my job.
java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) >From what I can find by doing some Google searches, this means the mapred task JVM has crashed. Not many suggestions about what to do about it. Some suggestions about increasing max heap. I tried that, although I don't think that's the issue because it's not a particularly memory intensive process and I've even tried it with a super small input data set of only a few records. Still see the same issue. Can't find anything else in the logs. I don't think my task even started, because there are no user logs created at all. Seems to fail during Job Setup. A little more background. This job was working fine for weeks, running hourly, and then failed on Saturday morning and hasn't worked since. Obviously, I looked for something that changed at that point, but no one was working at that time... can't find anything that changed. I tried the job with different input data sets, doesn't seem to matter, unless I run it with no data at all. The job does run with no input data, but if I have even a few input records it fails-doesn't seem to matter which records. I suspected some corruption in HDFS, but I was able to extract the data from HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data set to our TEST cluster and ran the job there... and it WORKED! Ran one of our other jobs and it failed as well, so it doesn't seem to be job specific either; looks like every job fails the same way. Did a complete reboot of the cluster-no impact. We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit. Any suggestions on what could be wrong or where to look for more information would be appreciated. Marc Limotte Feeva Technology PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
-
Re: Task process exit with nonzero status of 1Edward Capriolo 2009-09-24, 14:50
On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte <[EMAIL PROTECTED]> wrote:
> I'm seeing this error when I try to run my job. > > java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) > > From what I can find by doing some Google searches, this means the mapred task JVM has crashed. Not many suggestions about what to do about it. Some suggestions about increasing max heap. I tried that, although I don't think that's the issue because it's not a particularly memory intensive process and I've even tried it with a super small input data set of only a few records. Still see the same issue. > > Can't find anything else in the logs. I don't think my task even started, because there are no user logs created at all. Seems to fail during Job Setup. > > A little more background. This job was working fine for weeks, running hourly, and then failed on Saturday morning and hasn't worked since. Obviously, I looked for something that changed at that point, but no one was working at that time... can't find anything that changed. I tried the job with different input data sets, doesn't seem to matter, unless I run it with no data at all. The job does run with no input data, but if I have even a few input records it fails-doesn't seem to matter which records. I suspected some corruption in HDFS, but I was able to extract the data from HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data set to our TEST cluster and ran the job there... and it WORKED! > > Ran one of our other jobs and it failed as well, so it doesn't seem to be job specific either; looks like every job fails the same way. > > Did a complete reboot of the cluster-no impact. > > We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit. > > Any suggestions on what could be wrong or where to look for more information would be appreciated. > > > > Marc Limotte > Feeva Technology > > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. > Just a shot in the dark.... Did you update java recently http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/
-
RE: Task process exit with nonzero status of 1Marc Limotte 2009-09-24, 16:57
Thanks for the suggestion, Edward. I only upgraded the JVM after the problem occurred to see if it would help, but it made no difference.
Marc -----Original Message----- From: Edward Capriolo [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 24, 2009 7:50 AM To: [EMAIL PROTECTED] Subject: Re: Task process exit with nonzero status of 1 On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte <[EMAIL PROTECTED]> wrote: > I'm seeing this error when I try to run my job. > > java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) > > From what I can find by doing some Google searches, this means the mapred task JVM has crashed. Not many suggestions about what to do about it. Some suggestions about increasing max heap. I tried that, although I don't think that's the issue because it's not a particularly memory intensive process and I've even tried it with a super small input data set of only a few records. Still see the same issue. > > Can't find anything else in the logs. I don't think my task even started, because there are no user logs created at all. Seems to fail during Job Setup. > > A little more background. This job was working fine for weeks, running hourly, and then failed on Saturday morning and hasn't worked since. Obviously, I looked for something that changed at that point, but no one was working at that time... can't find anything that changed. I tried the job with different input data sets, doesn't seem to matter, unless I run it with no data at all. The job does run with no input data, but if I have even a few input records it fails-doesn't seem to matter which records. I suspected some corruption in HDFS, but I was able to extract the data from HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data set to our TEST cluster and ran the job there... and it WORKED! > > Ran one of our other jobs and it failed as well, so it doesn't seem to be job specific either; looks like every job fails the same way. > > Did a complete reboot of the cluster-no impact. > > We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit. > > Any suggestions on what could be wrong or where to look for more information would be appreciated. > > > > Marc Limotte > Feeva Technology > > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. > Just a shot in the dark.... Did you update java recently http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/ PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
-
Re: Task process exit with nonzero status of 1Todd Lipcon 2009-09-24, 17:18
Hi Marc,
Exit status 1 usually means some kind of controlled exit by the mapreduce child task. Things like JVM crashes usually are indicated by other exit codes (134 seems to be the code most commonly reported). If you look at the stderr and stdout from your task (in the userlogs/ directory on the task tracker that ran them) do you see any output? Additionally, is there anything in the logs for the task tracker itself? That log is hadoop.log.dir/hadoop-<username>-tasktracker*log If that log is pretty long, try grepping for WARN, ERROR, or Exception -Todd On Thu, Sep 24, 2009 at 9:57 AM, Marc Limotte <[EMAIL PROTECTED]> wrote: > Thanks for the suggestion, Edward. I only upgraded the JVM after the > problem occurred to see if it would help, but it made no difference. > > Marc > > -----Original Message----- > From: Edward Capriolo [mailto:[EMAIL PROTECTED]] > Sent: Thursday, September 24, 2009 7:50 AM > To: [EMAIL PROTECTED] > Subject: Re: Task process exit with nonzero status of 1 > > On Wed, Sep 23, 2009 at 2:06 PM, Marc Limotte <[EMAIL PROTECTED]> wrote: > > I'm seeing this error when I try to run my job. > > > > java.io.IOException: Task process exit with nonzero status of 1. > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) > > > > From what I can find by doing some Google searches, this means the mapred > task JVM has crashed. Not many suggestions about what to do about it. Some > suggestions about increasing max heap. I tried that, although I don't think > that's the issue because it's not a particularly memory intensive process > and I've even tried it with a super small input data set of only a few > records. Still see the same issue. > > > > Can't find anything else in the logs. I don't think my task even > started, because there are no user logs created at all. Seems to fail during > Job Setup. > > > > A little more background. This job was working fine for weeks, running > hourly, and then failed on Saturday morning and hasn't worked since. > Obviously, I looked for something that changed at that point, but no one > was working at that time... can't find anything that changed. I tried the > job with different input data sets, doesn't seem to matter, unless I run it > with no data at all. The job does run with no input data, but if I have > even a few input records it fails-doesn't seem to matter which records. I > suspected some corruption in HDFS, but I was able to extract the data from > HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data > set to our TEST cluster and ran the job there... and it WORKED! > > > > Ran one of our other jobs and it failed as well, so it doesn't seem to be > job specific either; looks like every job fails the same way. > > > > Did a complete reboot of the cluster-no impact. > > > > We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit. > > > > Any suggestions on what could be wrong or where to look for more > information would be appreciated. > > > > > > > > Marc Limotte > > Feeva Technology > > > > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR > ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION > PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, > DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY > PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND > PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. > > > Just a shot in the dark.... > > Did you update java recently > > > http://www.koopman.me/2009/04/hadoop-0183-could-not-create-the-java-virtual-machine/ > > PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR > ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION > PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, > DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY > PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND
-
RE: Task process exit with nonzero status of 1Marc Limotte 2009-09-24, 18:24
Hi Todd.
No userlogs seem to be created. I'm guessing, because the map task never actually starts. I don't see any other errors in the tasktracker log, other than the one I put in the first message ("java.io.IOException: Task process exit with nonzero status of 1..."). I've included the output from one of the nodes' tasktracker logs below. Any other suggestions? Marc 2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's state:UNASSIGNED 2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_m_000003_0 2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-145 18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_m_-1451805198 spawned. 2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_m_-1451805198 exited. Number of t asks it ran: 02009-09-24 18:15:37,512 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 Child Error java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 done; removing files. 2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_r _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_r_000001_0 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_r_7885 020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_r_788502072 spawned. 2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_r_788502072 exited. Number of tas ks it ran: 0 2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000001_0 Child Error java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24 18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000001_0 done; removing files. 2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_m _000002_1 task's state:UNASSIGNED 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_m_000002_1 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_m_000002_1 2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-2120349138 2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_m_-2120349138 spawned. 2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0 2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000002_1 Child Error java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000002_1 done; removing files. 2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's state:UNASSIGNED 2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_r_000000_1 2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_r_000000_1 2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_r_-1417908695 2009-09-24 18:16:40,057 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_r_-1417908695 spawned. 2009-09-24 18:16:40,084 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000000_1 Child Error java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 2009-09-24 18:16:40,084 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_r_-1417908695 exited. Number of tasks it ran: 0 2009-09-24 18:16:43,091 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000000_1 done; removing files. 2009-09-24 18:16:43,092 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2009-09-24 18:17:07,057 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_200909221656_0006 From: Todd Lipcon [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 24, 2009 10:19 AM To: [EMAIL PROTECTED] Subject: Re: Task process exit with nonzero status of 1 Hi Marc
-
Re: Task process exit with nonzero status of 1Todd Lipcon 2009-09-24, 18:27
Odd...
Try bumping up the logs to debug level on that tasktracker, see what you can determine? You could also strace -f -p <tasktracker pid> -o /tmp/tt_log and then grep through those logs later to see what might be going on. -Todd On Thu, Sep 24, 2009 at 11:24 AM, Marc Limotte <[EMAIL PROTECTED]> wrote: > Hi Todd. > > No userlogs seem to be created. I'm guessing, because the map task never > actually starts. > > I don't see any other errors in the tasktracker log, other than the one I > put in the first message ("java.io.IOException: Task process exit with > nonzero status of 1..."). I've included the output from one of the nodes' > tasktracker logs below. > > Any other suggestions? > > Marc > > 2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's > state:UNASSIGNED > 2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying > to launch : attempt_200909221656_0006_m_000003_0 > 2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO > org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: > jvm_200909221656_0006_m_-145 > 18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: > JVM Runner jvm_200909221656_0006_m_-1451805198 spawned. > 2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200909221656_0006_m_-1451805198 exited. Number of t > asks it ran: 02009-09-24 18:15:37,512 WARN > org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 > Child Error > java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) > 2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_m_000003_0 done; removing files. > 2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: > addFreeSlot : current free slots : 2 > 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_r > _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO > org.apache.hadoop.mapred.TaskTracker: Trying to launch : > attempt_200909221656_0006_r_000001_0 > 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO > org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: > jvm_200909221656_0006_r_7885 > 020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM > Runner jvm_200909221656_0006_r_788502072 spawned. > 2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200909221656_0006_r_788502072 exited. Number of tas > ks it ran: 0 > 2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_r_000001_0 Child Error > java.io.IOException: Task process exit with nonzero status of 1. > at > org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24 > 18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_r_000001_0 done; removing files. > 2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker: > addFreeSlot : current free slots : 2 > 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_m > _000002_1 task's state:UNASSIGNED > 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying > to launch : attempt_200909221656_0006_m_000002_1 > 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_m_000002_1 > 2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In > JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-2120349138
-
Re: Task process exit with nonzero status of 1Koji Noguchi 2009-09-24, 18:37
> > A little more background. This job was working fine for weeks, running
> > hourly, and then failed on Saturday morning and hasn't worked since. Any chance that ulimit (mapred.child.ulimit) got enabled? Koji On 9/24/09 11:24 AM, "Marc Limotte" <[EMAIL PROTECTED]> wrote: > Hi Todd. > > No userlogs seem to be created. I'm guessing, because the map task never > actually starts. > > I don't see any other errors in the tasktracker log, other than the one I put > in the first message ("java.io.IOException: Task process exit with nonzero > status of 1..."). I've included the output from one of the nodes' tasktracker > logs below. > > Any other suggestions? > > Marc > > 2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's > state:UNASSIGNED > 2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying to > launch : attempt_200909221656_0006_m_000003_0 > 2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO > org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: > jvm_200909221656_0006_m_-145 > 18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: JVM > Runner jvm_200909221656_0006_m_-1451805198 spawned. > 2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200909221656_0006_m_-1451805198 exited. Number of t > asks it ran: 02009-09-24 18:15:37,512 WARN > org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 > Child Error > java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) > 2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_m_000003_0 done; removing files. > 2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot > : current free slots : 2 > 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_r > _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO > org.apache.hadoop.mapred.TaskTracker: Trying to launch : > attempt_200909221656_0006_r_000001_0 > 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO > org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: > jvm_200909221656_0006_r_7885 > 020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM > Runner jvm_200909221656_0006_r_788502072 spawned. > 2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200909221656_0006_r_788502072 exited. Number of tas > ks it ran: 0 > 2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_r_000001_0 Child Error > java.io.IOException: Task process exit with nonzero status of 1. > at > org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24 > 18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_r_000001_0 done; removing files. > 2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot > : current free slots : 2 > 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_m > _000002_1 task's state:UNASSIGNED > 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying to > launch : attempt_200909221656_0006_m_000002_1 > 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_m_000002_1 > 2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner > constructed JVM ID
-
RE: Task process exit with nonzero status of 1Marc Limotte 2009-09-24, 21:11
Hi Koji,
Thanks for the suggestion. We have not set mapred.child.ulimit in our hadoop conf files. And I verified that it was not set in the logged job.conf. Don't see any limits set at the OS level, either. Marc -----Original Message----- From: Koji Noguchi [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 24, 2009 11:37 AM To: [EMAIL PROTECTED] Subject: Re: Task process exit with nonzero status of 1 > > A little more background. This job was working fine for weeks, running > > hourly, and then failed on Saturday morning and hasn't worked since. Any chance that ulimit (mapred.child.ulimit) got enabled? Koji On 9/24/09 11:24 AM, "Marc Limotte" <[EMAIL PROTECTED]> wrote: > Hi Todd. > > No userlogs seem to be created. I'm guessing, because the map task never > actually starts. > > I don't see any other errors in the tasktracker log, other than the one I put > in the first message ("java.io.IOException: Task process exit with nonzero > status of 1..."). I've included the output from one of the nodes' tasktracker > logs below. > > Any other suggestions? > > Marc > > 2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's > state:UNASSIGNED > 2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying to > launch : attempt_200909221656_0006_m_000003_0 > 2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO > org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: > jvm_200909221656_0006_m_-145 > 18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: JVM > Runner jvm_200909221656_0006_m_-1451805198 spawned. > 2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200909221656_0006_m_-1451805198 exited. Number of t > asks it ran: 02009-09-24 18:15:37,512 WARN > org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 > Child Error > java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) > 2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_m_000003_0 done; removing files. > 2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot > : current free slots : 2 > 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_r > _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO > org.apache.hadoop.mapred.TaskTracker: Trying to launch : > attempt_200909221656_0006_r_000001_0 > 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO > org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: > jvm_200909221656_0006_r_7885 > 020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM > Runner jvm_200909221656_0006_r_788502072 spawned. > 2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200909221656_0006_r_788502072 exited. Number of tas > ks it ran: 0 > 2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_r_000001_0 Child Error > java.io.IOException: Task process exit with nonzero status of 1. > at > org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24 > 18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_r_000001_0 done; removing files. > 2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot > : current free slots : 2 > 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_m > _000002_1 task's state:UNASSIGNED PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
-
RE: Task process exit with nonzero status of 1Marc Limotte 2009-09-24, 21:19
Added DEBUG, but don't see anything interesting. The only new tasktracker log entries are about receiving a heartbeat from the JobTracker, or about cleaning up the task afterward.
Tried the strace. It produces over 6mm lines of output. Not sure what I should be looking for. I'm thinking I might try the Cloudera Hadoop 0.20.0 distribution and see if the behavior is any different. Marc -----Original Message----- From: Todd Lipcon [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 24, 2009 11:28 AM To: [EMAIL PROTECTED] Subject: Re: Task process exit with nonzero status of 1 Odd... Try bumping up the logs to debug level on that tasktracker, see what you can determine? You could also strace -f -p <tasktracker pid> -o /tmp/tt_log and then grep through those logs later to see what might be going on. -Todd On Thu, Sep 24, 2009 at 11:24 AM, Marc Limotte <[EMAIL PROTECTED]> wrote: > Hi Todd. > > No userlogs seem to be created. I'm guessing, because the map task never > actually starts. > > I don't see any other errors in the tasktracker log, other than the one I > put in the first message ("java.io.IOException: Task process exit with > nonzero status of 1..."). I've included the output from one of the nodes' > tasktracker logs below. > > Any other suggestions? > > Marc > > 2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's > state:UNASSIGNED > 2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying > to launch : attempt_200909221656_0006_m_000003_0 > 2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO > org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: > jvm_200909221656_0006_m_-145 > 18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: > JVM Runner jvm_200909221656_0006_m_-1451805198 spawned. > 2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200909221656_0006_m_-1451805198 exited. Number of t > asks it ran: 02009-09-24 18:15:37,512 WARN > org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 > Child Error > java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) > 2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_m_000003_0 done; removing files. > 2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: > addFreeSlot : current free slots : 2 > 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_r > _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO > org.apache.hadoop.mapred.TaskTracker: Trying to launch : > attempt_200909221656_0006_r_000001_0 > 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO > org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: > jvm_200909221656_0006_r_7885 > 020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM > Runner jvm_200909221656_0006_r_788502072 spawned. > 2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200909221656_0006_r_788502072 exited. Number of tas > ks it ran: 0 > 2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_r_000001_0 Child Error > java.io.IOException: Task process exit with nonzero status of 1. > at > org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24 > 18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_r_000001_0 done; removing files. > 2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker: PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
-
RE: Task process exit with nonzero status of 1Marc Limotte 2009-09-24, 22:54
One more clue.
If I change "mapred.job.tracker" to "local" on this cluster, then the I can run the job successfully. I guess in this case it doesn't have to launch the child JVM, which is the thing that is failing. Marc -----Original Message----- From: Marc Limotte [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 24, 2009 2:19 PM To: [EMAIL PROTECTED] Cc: Deept Kumar Subject: RE: Task process exit with nonzero status of 1 Added DEBUG, but don't see anything interesting. The only new tasktracker log entries are about receiving a heartbeat from the JobTracker, or about cleaning up the task afterward. Tried the strace. It produces over 6mm lines of output. Not sure what I should be looking for. I'm thinking I might try the Cloudera Hadoop 0.20.0 distribution and see if the behavior is any different. Marc -----Original Message----- From: Todd Lipcon [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 24, 2009 11:28 AM To: [EMAIL PROTECTED] Subject: Re: Task process exit with nonzero status of 1 Odd... Try bumping up the logs to debug level on that tasktracker, see what you can determine? You could also strace -f -p <tasktracker pid> -o /tmp/tt_log and then grep through those logs later to see what might be going on. -Todd On Thu, Sep 24, 2009 at 11:24 AM, Marc Limotte <[EMAIL PROTECTED]> wrote: > Hi Todd. > > No userlogs seem to be created. I'm guessing, because the map task never > actually starts. > > I don't see any other errors in the tasktracker log, other than the one I > put in the first message ("java.io.IOException: Task process exit with > nonzero status of 1..."). I've included the output from one of the nodes' > tasktracker logs below. > > Any other suggestions? > > Marc > > 2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's > state:UNASSIGNED > 2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying > to launch : attempt_200909221656_0006_m_000003_0 > 2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO > org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: > jvm_200909221656_0006_m_-145 > 18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: > JVM Runner jvm_200909221656_0006_m_-1451805198 spawned. > 2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200909221656_0006_m_-1451805198 exited. Number of t > asks it ran: 02009-09-24 18:15:37,512 WARN > org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 > Child Error > java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) > 2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner: > attempt_200909221656_0006_m_000003_0 done; removing files. > 2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: > addFreeSlot : current free slots : 2 > 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction (registerTask): attempt_200909221656_0006_r > _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO > org.apache.hadoop.mapred.TaskTracker: Trying to launch : > attempt_200909221656_0006_r_000001_0 > 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In > TaskLauncher, current free slots : 2 and trying to launch > attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO > org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: > jvm_200909221656_0006_r_7885 > 020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM > Runner jvm_200909221656_0006_r_788502072 spawned. > 2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM : > jvm_200909221656_0006_r_788502072 exited. Number of tas > ks it ran: 0 PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.
-
Re: Task process exit with nonzero status of 1Vinod KV 2009-09-25, 03:22
Marc, Looking at the logs, TaskTracker indeed did try to launch the JVM but failed immediately. So either it is related to the JVM or the log files, given your statement that your code used to work properly before. Looks like you came out blank w.r.t JVM Can you try and see if the userlogs/tasklogs directory has permissions for the user running the jvm to write files to? User running the JVM is same as the one that runs TT, unless you use LinuxTaskController. If that doesn't take you anywhere too, can you dump your TT configuration, may be after making anonymous any site specific information like hosts/ports, if you wish? Vinod Marc Limotte wrote: > One more clue. > > If I change "mapred.job.tracker" to "local" on this cluster, then the I can run the job successfully. I guess in this case it doesn't have to launch the child JVM, which is the thing that is failing. > > > Marc > > -----Original Message----- > From: Marc Limotte [mailto:[EMAIL PROTECTED]] > Sent: Thursday, September 24, 2009 2:19 PM > To: [EMAIL PROTECTED] > Cc: Deept Kumar > Subject: RE: Task process exit with nonzero status of 1 > > Added DEBUG, but don't see anything interesting. The only new tasktracker log entries are about receiving a heartbeat from the JobTracker, or about cleaning up the task afterward. > > Tried the strace. It produces over 6mm lines of output. Not sure what I should be looking for. > > I'm thinking I might try the Cloudera Hadoop 0.20.0 distribution and see if the behavior is any different. > > Marc > > -----Original Message----- > From: Todd Lipcon [mailto:[EMAIL PROTECTED]] > Sent: Thursday, September 24, 2009 11:28 AM > To: [EMAIL PROTECTED] > Subject: Re: Task process exit with nonzero status of 1 > > Odd... > > Try bumping up the logs to debug level on that tasktracker, see what you can > determine? > > You could also strace -f -p <tasktracker pid> -o /tmp/tt_log and then grep > through those logs later to see what might be going on. > > -Todd > > On Thu, Sep 24, 2009 at 11:24 AM, Marc Limotte <[EMAIL PROTECTED]> wrote: > > >> Hi Todd. >> >> No userlogs seem to be created. I'm guessing, because the map task never >> actually starts. >> >> I don't see any other errors in the tasktracker log, other than the one I >> put in the first message ("java.io.IOException: Task process exit with >> nonzero status of 1..."). I've included the output from one of the nodes' >> tasktracker logs below. >> >> Any other suggestions? >> >> Marc >> >> 2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker: >> LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's >> state:UNASSIGNED >> 2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying >> to launch : attempt_200909221656_0006_m_000003_0 >> 2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In >> TaskLauncher, current free slots : 2 and trying to launch >> attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO >> org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: >> jvm_200909221656_0006_m_-145 >> 18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: >> JVM Runner jvm_200909221656_0006_m_-1451805198 spawned. >> 2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM : >> jvm_200909221656_0006_m_-1451805198 exited. Number of t >> asks it ran: 02009-09-24 18:15:37,512 WARN >> org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 >> Child Error >> java.io.IOException: Task process exit with nonzero status of 1. >> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) >> 2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner: >> attempt_200909221656_0006_m_000003_0 done; removing files. >> 2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: >> addFreeSlot : current free slots : 2 >> 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: >> LaunchTaskAction (registerTask): attempt_200909221656_0006_r
-
RE: Task process exit with nonzero status of 1Feng, Ao 2009-10-09, 17:47
I probably know what the problem it, as we are encountering the same issue on our prod cluster. Every once a while jobs start failing on the same task trackers, and the only error message is this exit status 1.
Go to the userlogs directory on the host where your tasks fail, and verify if there are 31,999 directories all looking like attempt_... Once you get to that point, JVM would run out of file descriptors, as it tries to create the 32,000 one. I confirmed that cleaning up the userlogs directory solves the problem... temporarily. So my questions are: 1. Where is the 32,000 limit imposed, and how do we change it? 2. In the Hadoop configuration, is there any parameter to specify when to automatically delete those old user logs (like after a month)? As we are running 0.19, I was hoping 0.20 has fixed it, but seems it is not the case. Can someone file a bug report/feature request, if there is no elegant solution at this time? Thanks, Ao -----Original Message----- From: Marc Limotte [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 24, 2009 11:24 AM To: [EMAIL PROTECTED] Subject: RE: Task process exit with nonzero status of 1 Hi Todd. No userlogs seem to be created. I'm guessing, because the map task never actually starts. I don't see any other errors in the tasktracker log, other than the one I put in the first message ("java.io.IOException: Task process exit with nonzero status of 1..."). I've included the output from one of the nodes' tasktracker logs below. Any other suggestions? Marc 2009-09-24 18:15:36,955 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_m_000003_0 task's state:UNASSIGNED 2009-09-24 18:15:36,959 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_m_000003_0 2009-09-24 18:15:36,960 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_m_000003_02009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-145 18051982009-09-24 18:15:37,483 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_m_-1451805198 spawned. 2009-09-24 18:15:37,511 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_m_-1451805198 exited. Number of t asks it ran: 02009-09-24 18:15:37,512 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 Child Error java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 2009-09-24 18:15:40,518 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000003_0 done; removing files. 2009-09-24 18:15:40,519 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_r _000001_0 task's state:UNASSIGNED2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_r_000001_0 2009-09-24 18:15:42,964 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_r_000001_02009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_r_7885 020722009-09-24 18:15:43,000 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_r_788502072 spawned. 2009-09-24 18:15:43,026 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_r_788502072 exited. Number of tas ks it ran: 0 2009-09-24 18:15:43,026 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000001_0 Child Error java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)2009-09-24 18:15:46,034 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_r_000001_0 done; removing files. 2009-09-24 18:15:46,039 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_m _000002_1 task's state:UNASSIGNED 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_m_000002_1 2009-09-24 18:16:34,022 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_m_000002_1 2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_200909221656_0006_m_-2120349138 2009-09-24 18:16:34,060 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_200909221656_0006_m_-2120349138 spawned. 2009-09-24 18:16:34,086 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_200909221656_0006_m_-2120349138 exited. Number of tasks it ran: 0 2009-09-24 18:16:34,087 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000002_1 Child Error java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 2009-09-24 18:16:37,094 INFO org.apache.hadoop.mapred.TaskRunner: attempt_200909221656_0006_m_000002_1 done; removing files. 2009-09-24 18:16:37,095 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_200909221656_0006_r_000000_1 task's state:UNASSIGNED 2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_200909221656_0006_r_000000_1 2009-09-24 18:16:40,032 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_200909221656_0006_r_000000_1 2009-09-24 18:16:40,057 IN
-
Re: Task process exit with nonzero status of 1Frank Singleton 2009-10-09, 18:27
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Feng, Ao wrote: > I probably know what the problem it, as we are encountering the same issue on our prod cluster. Every once a while jobs start failing on the same task trackers, and the only error message is this exit status 1. > > Go to the userlogs directory on the host where your tasks fail, and verify if there are 31,999 directories all looking like attempt_... Once you get to that point, JVM would run out of file descriptors, as it tries to create the 32,000 one. I confirmed that cleaning up the userlogs directory solves the problem... temporarily. > > So my questions are: > > 1. Where is the 32,000 limit imposed, and how do we change it? > As far as ext3 file system capabilities are concerned, http://en.wikipedia.org/wiki/Ext3 Specifically <quote> "There is a limit of 31998 sub-directories per one directory, stemming from its limit of 32000 links per inode" </quote> There is actually a funny story behind my personal experience with this (which I shall shorten for brevity) After I typed "ls <tab>" (to get the list of files/directories via bash completion) one day in a directory, the system came back (after a while) and said (from memory), Display all 31998 possibilities? (y or n) Hmm, where have I seen a number like (or close to) that before ? Cheers / Frank -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iEYEARECAAYFAkrPgGMACgkQpZzN+MMic6cv2ACfQ7xTuIvXnx1VkmhNwJwW7Xlc lugAn38nuAOKcDUFx/BokcuPcHBEbmIH =O0fm -----END PGP SIGNATURE-----
-
RE: Task process exit with nonzero status of 1Marc Limotte 2009-10-27, 21:48
Just an FYI, found the solution to this problem.
Apparently, it's an OS limit on the number of sub-directories that can be created in another directory. In this case, we had 31998 sub-directories under hadoop/userlogs/, so any new tasks would fail in Job Setup. >From the unix command line, mkdir fails as well: $ mkdir hadoop/userlogs/testdir mkdir: cannot create directory `hadoop/userlogs/testdir': Too many links Difficult to track down because the Hadoop error message gives no hint whatsoever. And normally, you'd look in the userlog itself for more info, but in this case the userlog couldn't be created. Marc -----Original Message----- From: Marc Limotte Sent: Wednesday, September 23, 2009 11:06 AM To: '[EMAIL PROTECTED]' Subject: Task process exit with nonzero status of 1 I'm seeing this error when I try to run my job. java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) >From what I can find by doing some Google searches, this means the mapred task JVM has crashed. Not many suggestions about what to do about it. Some suggestions about increasing max heap. I tried that, although I don't think that's the issue because it's not a particularly memory intensive process and I've even tried it with a super small input data set of only a few records. Still see the same issue. Can't find anything else in the logs. I don't think my task even started, because there are no user logs created at all. Seems to fail during Job Setup. A little more background. This job was working fine for weeks, running hourly, and then failed on Saturday morning and hasn't worked since. Obviously, I looked for something that changed at that point, but no one was working at that time... can't find anything that changed. I tried the job with different input data sets, doesn't seem to matter, unless I run it with no data at all. The job does run with no input data, but if I have even a few input records it fails-doesn't seem to matter which records. I suspected some corruption in HDFS, but I was able to extract the data from HDFS (hadoop dfs -get ...) and the data looks ok. I also copied this data set to our TEST cluster and ran the job there... and it WORKED! Ran one of our other jobs and it failed as well, so it doesn't seem to be job specific either; looks like every job fails the same way. Did a complete reboot of the cluster-no impact. We're using Hadoop 0.20.0, and Java 1.6 update 16 on CentOS 5.2 64bit. Any suggestions on what could be wrong or where to look for more information would be appreciated. Marc Limotte Feeva Technology PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. |