|
Amit Sela
2013-03-12, 11:34
Jean-Marc Spaggiari
2013-03-12, 11:40
Amit Sela
2013-03-12, 12:08
Leo Leung
2013-03-12, 16:57
George Datskos
2013-03-13, 02:57
Amit Sela
2013-03-13, 09:03
Amit Sela
2013-03-13, 12:10
Jim Twensky
2013-05-24, 17:00
|
-
Child errorAmit Sela 2013-03-12, 11:34
Hi all,
I have a weird failure occurring every now and then during a MapReduce job. This is the error: *java.lang.Throwable: Child Error* * at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)* *Caused by: java.io.IOException: Task process exit with nonzero status of 255.* * at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)* * * And sometimes it's the same but with *status of 126.* * * Any ideas ? Thanks.
-
Re: Child errorJean-Marc Spaggiari 2013-03-12, 11:40
Hi Amit,
Which Hadoop version are you using? I have been told it's because of https://issues.apache.org/jira/browse/MAPREDUCE-2374 JM 2013/3/12 Amit Sela <[EMAIL PROTECTED]>: > Hi all, > > I have a weird failure occurring every now and then during a MapReduce job. > > This is the error: > > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status of > 255. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > And sometimes it's the same but with status of 126. > > Any ideas ? > > Thanks.
-
Re: Child errorAmit Sela 2013-03-12, 12:08
Hi Jean-Marc,
I am running Hadoop 1.0.3, and I did see this issue you've mentioned but the exit status in the issue is 126 and sometimes I get 255. Any ideas what do theses status codes mean ? Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" is such upgrade (shouldn't differ from 1.0.3 that much no ?) Thanks! On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari < [EMAIL PROTECTED]> wrote: > Hi Amit, > > Which Hadoop version are you using? > > I have been told it's because of > https://issues.apache.org/jira/browse/MAPREDUCE-2374 > > JM > > 2013/3/12 Amit Sela <[EMAIL PROTECTED]>: > > Hi all, > > > > I have a weird failure occurring every now and then during a MapReduce > job. > > > > This is the error: > > > > java.lang.Throwable: Child Error > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > > Caused by: java.io.IOException: Task process exit with nonzero status of > > 255. > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > > > And sometimes it's the same but with status of 126. > > > > Any ideas ? > > > > Thanks. >
-
RE: Child errorLeo Leung 2013-03-12, 16:57
or https://issues.apache.org/jira/browse/MAPREDUCE-4857
Which is fixed in 1.0.4 From: Amit Sela [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 12, 2013 5:08 AM To: [EMAIL PROTECTED] Subject: Re: Child error Hi Jean-Marc, I am running Hadoop 1.0.3, and I did see this issue you've mentioned but the exit status in the issue is 126 and sometimes I get 255. Any ideas what do theses status codes mean ? Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" is such upgrade (shouldn't differ from 1.0.3 that much no ?) Thanks! On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi Amit, Which Hadoop version are you using? I have been told it's because of https://issues.apache.org/jira/browse/MAPREDUCE-2374 JM 2013/3/12 Amit Sela <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>: > Hi all, > > I have a weird failure occurring every now and then during a MapReduce job. > > This is the error: > > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status of > 255. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > And sometimes it's the same but with status of 126. > > Any ideas ? > > Thanks.
-
Re: Child errorGeorge Datskos 2013-03-13, 02:57
Leo
That JIRA says "fix version=1.0.4" but it is not correct. The real JIRA is MAPREDUCE-2374. The actual fix version for this bug 1.1.2 George > or https://issues.apache.org/jira/browse/MAPREDUCE-4857 > > Which is fixed in 1.0.4 > > *From:*Amit Sela [mailto:[EMAIL PROTECTED]] > *Sent:* Tuesday, March 12, 2013 5:08 AM > *To:* [EMAIL PROTECTED] > *Subject:* Re: Child error > > Hi Jean-Marc, > > I am running Hadoop 1.0.3, and I did see this issue you've mentioned > but the exit status in the issue is 126 and sometimes I get 255. > > Any ideas what do theses status codes mean ? > > Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" > is such upgrade (shouldn't differ from 1.0.3 that much no ?) > > Thanks! > > On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: > > Hi Amit, > > Which Hadoop version are you using? > > I have been told it's because of > https://issues.apache.org/jira/browse/MAPREDUCE-2374 > > JM > > 2013/3/12 Amit Sela <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>: > > > Hi all, > > > > I have a weird failure occurring every now and then during a > MapReduce job. > > > > This is the error: > > > > java.lang.Throwable: Child Error > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > > Caused by: java.io.IOException: Task process exit with nonzero status of > > 255. > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > > > And sometimes it's the same but with status of 126. > > > > Any ideas ? > > > > Thanks. >
-
Re: Child errorAmit Sela 2013-03-13, 09:03
But the patch will work on 1.0.4 correct ?
On Wed, Mar 13, 2013 at 4:57 AM, George Datskos < [EMAIL PROTECTED]> wrote: > Leo > > That JIRA says "fix version=1.0.4" but it is not correct. The real JIRA > is MAPREDUCE-2374. > > The actual fix version for this bug 1.1.2 > > > George > > > or https://issues.apache.org/jira/browse/MAPREDUCE-4857**** > > Which is fixed in 1.0.4**** > > ** ** > > ** ** > > *From:* Amit Sela [mailto:[EMAIL PROTECTED] <[EMAIL PROTECTED]>] > *Sent:* Tuesday, March 12, 2013 5:08 AM > *To:* [EMAIL PROTECTED] > *Subject:* Re: Child error**** > > ** ** > > Hi Jean-Marc, **** > > I am running Hadoop 1.0.3, and I did see this issue you've mentioned but > the exit status in the issue is 126 and sometimes I get 255.**** > > Any ideas what do theses status codes mean ? **** > > Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" is > such upgrade (shouldn't differ from 1.0.3 that much no ?)**** > > ** ** > > Thanks!**** > > ** ** > > ** ** > > On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote:**** > > Hi Amit, > > Which Hadoop version are you using? > > I have been told it's because of > https://issues.apache.org/jira/browse/MAPREDUCE-2374 > > JM > > 2013/3/12 Amit Sela <[EMAIL PROTECTED]>:**** > > > Hi all, > > > > I have a weird failure occurring every now and then during a MapReduce > job. > > > > This is the error: > > > > java.lang.Throwable: Child Error > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > > Caused by: java.io.IOException: Task process exit with nonzero status of > > 255. > > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) > > > > And sometimes it's the same but with status of 126. > > > > Any ideas ? > > > > Thanks.**** > > ** ** > > >
-
Re: Child errorAmit Sela 2013-03-13, 12:10
10x
On Wed, Mar 13, 2013 at 1:56 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote: > dont wait patch, its a very simple fix. just do it. > On Mar 13, 2013 5:04 PM, "Amit Sela" <[EMAIL PROTECTED]> wrote: > >> But the patch will work on 1.0.4 correct ? >> >> On Wed, Mar 13, 2013 at 4:57 AM, George Datskos < >> [EMAIL PROTECTED]> wrote: >> >>> Leo >>> >>> That JIRA says "fix version=1.0.4" but it is not correct. The real JIRA >>> is MAPREDUCE-2374. >>> >>> The actual fix version for this bug 1.1.2 >>> >>> >>> George >>> >>> >>> or https://issues.apache.org/jira/browse/MAPREDUCE-4857**** >>> >>> Which is fixed in 1.0.4**** >>> >>> ** ** >>> >>> ** ** >>> >>> *From:* Amit Sela [mailto:[EMAIL PROTECTED] <[EMAIL PROTECTED]>] >>> *Sent:* Tuesday, March 12, 2013 5:08 AM >>> *To:* [EMAIL PROTECTED] >>> *Subject:* Re: Child error**** >>> >>> ** ** >>> >>> Hi Jean-Marc, **** >>> >>> I am running Hadoop 1.0.3, and I did see this issue you've mentioned but >>> the exit status in the issue is 126 and sometimes I get 255.**** >>> >>> Any ideas what do theses status codes mean ? **** >>> >>> Did you suffer this issue and upgraded to 1.0.4 ? If so, How "smooth" is >>> such upgrade (shouldn't differ from 1.0.3 that much no ?)**** >>> >>> ** ** >>> >>> Thanks!**** >>> >>> ** ** >>> >>> ** ** >>> >>> On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari < >>> [EMAIL PROTECTED]> wrote:**** >>> >>> Hi Amit, >>> >>> Which Hadoop version are you using? >>> >>> I have been told it's because of >>> https://issues.apache.org/jira/browse/MAPREDUCE-2374 >>> >>> JM >>> >>> 2013/3/12 Amit Sela <[EMAIL PROTECTED]>:**** >>> >>> > Hi all, >>> > >>> > I have a weird failure occurring every now and then during a MapReduce >>> job. >>> > >>> > This is the error: >>> > >>> > java.lang.Throwable: Child Error >>> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) >>> > Caused by: java.io.IOException: Task process exit with nonzero status >>> of >>> > 255. >>> > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) >>> > >>> > And sometimes it's the same but with status of 126. >>> > >>> > Any ideas ? >>> > >>> > Thanks.**** >>> >>> ** ** >>> >>> >>> >>
-
Re: Child ErrorJim Twensky 2013-05-24, 17:00
Hi again, in addition to my previous post, I was able to get some error
logs from the task tracker/data node this morning and looks like it might be a jetty issue: 2013-05-23 19:59:20,595 WARN org.apache.hadoop.mapred.TaskLog: Failed to retrieve stdout log for task: attempt_201305231647_0007_m_001096_0 java.io.IOException: Owner 'jim' for path /var/tmp/jim/hadoop-logs/userlogs/job_201305231647_0007/attempt_201305231647_0007_m_001096_0/stdout did not match expected owner '10929' at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:177) at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:117) at org.apache.hadoop.mapred.TaskLog$Reader.<init>(TaskLog.java:455) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:848) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) I am wondering if I am hitting MAPREDUCE-2389<https://issues.apache.org/jira/browse/MAPREDUCE-2389>If so, how do I downgrade my jetty version? Should I just replace the jetty jar file in the lib directory with an earlier version and restart my cluster? Thank you. On Thu, May 23, 2013 at 7:14 PM, Jim Twensky <[EMAIL PROTECTED]> wrote: > Hello, I have a 20 node Hadoop cluster where each node has 8GB memory and > an 8-core processor. I sometimes get the following error on a random basis: > > > > ----------------------------------------------------------------------------------------------------------- > > Exception in thread "main" java.io.IOException: Exception reading file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305231647_0005/jobToken > at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135) > at org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165) > at org.apache.hadoop.mapred.Child.main(Child.java:92) > Caused by: java.io.IOException: failure to login > at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:501) > at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463) > at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1519) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) > at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129) > ... 2 more > Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name > at com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:70) > at com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:132) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > ...... > > > ----------------------------------------------------------------------------------------------------------- > > This does not always happen but I see a pattern when the intermediate data > is larger, it tends to occur more frequently. In the web log, I can see the > following: > > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) > Caused by: java.io.IOException: Task process exit with nonzero status of 1. |