|
|
-
division by zero in getLocalPathForWrite()
Ted Yu 2012-10-25, 04:54
Hi, HBase has Jenkins build against hadoop 2.0 I was checking why TestRowCounter sometimes failed: https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testReport/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiveColumn/I think the following could be the cause: 2012-10-22 23:46:32,571 WARN [AsyncDispatcher event handler] resourcemanager.RMAuditLogger(255): USER=jenkins OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1350949562159_0002 failed 1 times due to AM Container for appattempt_1350949562159_0002_000001 exited with exitCode: -1000 due to: java.lang.ArithmeticException: / by zero at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:355) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:257) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:849) However, I don't seem to find where in getLocalPathForWrite() division by zero could have arisen. Comment / hint is welcome. Thanks
+
Ted Yu 2012-10-25, 04:54
-
Re: division by zero in getLocalPathForWrite()
Robert Evans 2012-10-25, 14:07
It looks like you are running with an older version of 2.0, even though it does not really make much of a difference in this case, The issue shows up when getLocalPathForWrite thinks there is no space on to write to on any of the disks it has configured. This could be because you do not have any directories configured. I really don't know for sure exactly what is happening. It might be disk fail in place removing disks for you because of other issues. Either way we should file a JIRA against Hadoop to make it so we never get the / by zero error and provide a better way to handle the possible causes. --Bobby Evans On 10/24/12 11:54 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: >Hi, >HBase has Jenkins build against hadoop 2.0 >I was checking why TestRowCounter sometimes failed: > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testReport/o>rg.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiveCol >umn/ > >I think the following could be the cause: > >2012-10-22 23:46:32,571 WARN [AsyncDispatcher event handler] >resourcemanager.RMAuditLogger(255): USER=jenkins OPERATION=Application >Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App >failed with state: FAILED PERMISSIONS=Application >application_1350949562159_0002 failed 1 times due to AM Container for >appattempt_1350949562159_0002_000001 exited with exitCode: -1000 due >to: java.lang.ArithmeticException: / by zero > at >org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathFor >Write(LocalDirAllocator.java:355) > at >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >tor.java:150) > at >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >tor.java:131) > at >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >tor.java:115) > at >org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocal >PathForWrite(LocalDirsHandlerService.java:257) > at >org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.Resou >rceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.jav >a:849) > >However, I don't seem to find where in getLocalPathForWrite() division by >zero could have arisen. > >Comment / hint is welcome. > >Thanks
+
Robert Evans 2012-10-25, 14:07
-
Re: division by zero in getLocalPathForWrite()
Ted Yu 2012-10-25, 14:54
Thanks for the quick response, Robert. Here is the hadoop version being used: <hadoop-two.version>2.0.1-alpha</hadoop-two.version> If there is newer release, I am willing to try that before filing JIRA. On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans <[EMAIL PROTECTED]> wrote: > It looks like you are running with an older version of 2.0, even though it > does not really make much of a difference in this case, The issue shows > up when getLocalPathForWrite thinks there is no space on to write to on > any of the disks it has configured. This could be because you do not have > any directories configured. I really don't know for sure exactly what is > happening. It might be disk fail in place removing disks for you because > of other issues. Either way we should file a JIRA against Hadoop to make > it so we never get the / by zero error and provide a better way to handle > the possible causes. > > --Bobby Evans > > On 10/24/12 11:54 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: > > >Hi, > >HBase has Jenkins build against hadoop 2.0 > >I was checking why TestRowCounter sometimes failed: > > > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testReport/o> >rg.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiveCol > >umn/ > > > >I think the following could be the cause: > > > >2012-10-22 23:46:32,571 WARN [AsyncDispatcher event handler] > >resourcemanager.RMAuditLogger(255): USER=jenkins > OPERATION=Application > >Finished - Failed TARGET=RMAppManager RESULT=FAILURE > DESCRIPTION=App > >failed with state: FAILED PERMISSIONS=Application > >application_1350949562159_0002 failed 1 times due to AM Container for > >appattempt_1350949562159_0002_000001 exited with exitCode: -1000 due > >to: java.lang.ArithmeticException: / by zero > > at > >org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathFor > >Write(LocalDirAllocator.java:355) > > at > >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca > >tor.java:150) > > at > >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca > >tor.java:131) > > at > >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca > >tor.java:115) > > at > >org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocal > >PathForWrite(LocalDirsHandlerService.java:257) > > at > >org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.Resou > >rceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.jav > >a:849) > > > >However, I don't seem to find where in getLocalPathForWrite() division by > >zero could have arisen. > > > >Comment / hint is welcome. > > > >Thanks > >
+
Ted Yu 2012-10-25, 14:54
-
Re: division by zero in getLocalPathForWrite()
Ted Yu 2012-10-25, 15:04
I will try 2.0.2-alpha release. Cheers On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > Thanks for the quick response, Robert. > Here is the hadoop version being used: > <hadoop-two.version>2.0.1-alpha</hadoop-two.version> > > If there is newer release, I am willing to try that before filing JIRA. > > > On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans <[EMAIL PROTECTED]> wrote: > >> It looks like you are running with an older version of 2.0, even though it >> does not really make much of a difference in this case, The issue shows >> up when getLocalPathForWrite thinks there is no space on to write to on >> any of the disks it has configured. This could be because you do not have >> any directories configured. I really don't know for sure exactly what is >> happening. It might be disk fail in place removing disks for you because >> of other issues. Either way we should file a JIRA against Hadoop to make >> it so we never get the / by zero error and provide a better way to handle >> the possible causes. >> >> --Bobby Evans >> >> On 10/24/12 11:54 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: >> >> >Hi, >> >HBase has Jenkins build against hadoop 2.0 >> >I was checking why TestRowCounter sometimes failed: >> > >> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testReport/o>> >> >rg.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiveCol >> >umn/ >> > >> >I think the following could be the cause: >> > >> >2012-10-22 23:46:32,571 WARN [AsyncDispatcher event handler] >> >resourcemanager.RMAuditLogger(255): USER=jenkins >> OPERATION=Application >> >Finished - Failed TARGET=RMAppManager RESULT=FAILURE >> DESCRIPTION=App >> >failed with state: FAILED PERMISSIONS=Application >> >application_1350949562159_0002 failed 1 times due to AM Container for >> >appattempt_1350949562159_0002_000001 exited with exitCode: -1000 due >> >to: java.lang.ArithmeticException: / by zero >> > at >> >> >org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathFor >> >Write(LocalDirAllocator.java:355) >> > at >> >> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >> >tor.java:150) >> > at >> >> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >> >tor.java:131) >> > at >> >> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >> >tor.java:115) >> > at >> >> >org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocal >> >PathForWrite(LocalDirsHandlerService.java:257) >> > at >> >> >org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.Resou >> >> >rceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.jav >> >a:849) >> > >> >However, I don't seem to find where in getLocalPathForWrite() division by >> >zero could have arisen. >> > >> >Comment / hint is welcome. >> > >> >Thanks >> >> >
+
Ted Yu 2012-10-25, 15:04
-
Re: division by zero in getLocalPathForWrite()
Ted Yu 2012-10-30, 04:56
TestRowCounter still fails: https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColumn/but there was no 'divide by zero' exception. Cheers On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > I will try 2.0.2-alpha release. > > Cheers > > > On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> Thanks for the quick response, Robert. >> Here is the hadoop version being used: >> <hadoop-two.version>2.0.1-alpha</hadoop-two.version> >> >> If there is newer release, I am willing to try that before filing JIRA. >> >> >> On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans <[EMAIL PROTECTED]>wrote: >> >>> It looks like you are running with an older version of 2.0, even though >>> it >>> does not really make much of a difference in this case, The issue shows >>> up when getLocalPathForWrite thinks there is no space on to write to on >>> any of the disks it has configured. This could be because you do not >>> have >>> any directories configured. I really don't know for sure exactly what is >>> happening. It might be disk fail in place removing disks for you because >>> of other issues. Either way we should file a JIRA against Hadoop to make >>> it so we never get the / by zero error and provide a better way to handle >>> the possible causes. >>> >>> --Bobby Evans >>> >>> On 10/24/12 11:54 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: >>> >>> >Hi, >>> >HBase has Jenkins build against hadoop 2.0 >>> >I was checking why TestRowCounter sometimes failed: >>> > >>> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testReport/o>>> >>> >rg.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiveCol >>> >umn/ >>> > >>> >I think the following could be the cause: >>> > >>> >2012-10-22 23:46:32,571 WARN [AsyncDispatcher event handler] >>> >resourcemanager.RMAuditLogger(255): USER=jenkins >>> OPERATION=Application >>> >Finished - Failed TARGET=RMAppManager RESULT=FAILURE >>> DESCRIPTION=App >>> >failed with state: FAILED PERMISSIONS=Application >>> >application_1350949562159_0002 failed 1 times due to AM Container for >>> >appattempt_1350949562159_0002_000001 exited with exitCode: -1000 due >>> >to: java.lang.ArithmeticException: / by zero >>> > at >>> >>> >org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathFor >>> >Write(LocalDirAllocator.java:355) >>> > at >>> >>> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >>> >tor.java:150) >>> > at >>> >>> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >>> >tor.java:131) >>> > at >>> >>> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >>> >tor.java:115) >>> > at >>> >>> >org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocal >>> >PathForWrite(LocalDirsHandlerService.java:257) >>> > at >>> >>> >org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.Resou >>> >>> >rceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.jav >>> >a:849) >>> > >>> >However, I don't seem to find where in getLocalPathForWrite() division >>> by >>> >zero could have arisen. >>> > >>> >Comment / hint is welcome. >>> > >>> >Thanks >>> >>> >> >
+
Ted Yu 2012-10-30, 04:56
-
Re: division by zero in getLocalPathForWrite()
Kihwal Lee 2012-10-30, 16:29
Ted, I couldn't reproduce it by just running the test case. When you reproduce it, look at the stderr/stdout file somewhere under target/org.apache.hadoop.mapred.MiniMRCluster. Look for the one under the directory whose name containing the app id. I did run into a similar problem and the stderr said: /bin/bash: /bin/java: No such file or directory It was because JAVA_HOME was not set. But in this case the exit code was 127 (shell not being able to locate the command to exec). In the hudson job, the exit code was 1, so I think it's something else. Kihwal On 10/29/12 11:56 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: >TestRowCounter still fails: > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/j>unit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColu >mn/ > >but there was no 'divide by zero' exception. > >Cheers > >On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> I will try 2.0.2-alpha release. >> >> Cheers >> >> >> On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <[EMAIL PROTECTED]> wrote: >> >>> Thanks for the quick response, Robert. >>> Here is the hadoop version being used: >>> <hadoop-two.version>2.0.1-alpha</hadoop-two.version> >>> >>> If there is newer release, I am willing to try that before filing JIRA. >>> >>> >>> On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans >>><[EMAIL PROTECTED]>wrote: >>> >>>> It looks like you are running with an older version of 2.0, even >>>>though >>>> it >>>> does not really make much of a difference in this case, The issue >>>>shows >>>> up when getLocalPathForWrite thinks there is no space on to write to >>>>on >>>> any of the disks it has configured. This could be because you do not >>>> have >>>> any directories configured. I really don't know for sure exactly >>>>what is >>>> happening. It might be disk fail in place removing disks for you >>>>because >>>> of other issues. Either way we should file a JIRA against Hadoop to >>>>make >>>> it so we never get the / by zero error and provide a better way to >>>>handle >>>> the possible causes. >>>> >>>> --Bobby Evans >>>> >>>> On 10/24/12 11:54 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: >>>> >>>> >Hi, >>>> >HBase has Jenkins build against hadoop 2.0 >>>> >I was checking why TestRowCounter sometimes failed: >>>> > >>>> >>>> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testRepor>>>>t/o >>>> >>>> >>>>>rg.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiv >>>>>eCol >>>> >umn/ >>>> > >>>> >I think the following could be the cause: >>>> > >>>> >2012-10-22 23:46:32,571 WARN [AsyncDispatcher event handler] >>>> >resourcemanager.RMAuditLogger(255): USER=jenkins >>>> OPERATION=Application >>>> >Finished - Failed TARGET=RMAppManager RESULT=FAILURE >>>> DESCRIPTION=App >>>> >failed with state: FAILED PERMISSIONS=Application >>>> >application_1350949562159_0002 failed 1 times due to AM Container for >>>> >appattempt_1350949562159_0002_000001 exited with exitCode: -1000 due >>>> >to: java.lang.ArithmeticException: / by zero >>>> > at >>>> >>>> >>>>>org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPat >>>>>hFor >>>> >Write(LocalDirAllocator.java:355) >>>> > at >>>> >>>> >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl >>>>>loca >>>> >tor.java:150) >>>> > at >>>> >>>> >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl >>>>>loca >>>> >tor.java:131) >>>> > at >>>> >>>> >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl >>>>>loca >>>> >tor.java:115) >>>> > at >>>> >>>> >>>>>org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getL >>>>>ocal >>>> >PathForWrite(LocalDirsHandlerService.java:257) >>>> > at >>>> >>>> >>>>>org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.R >>>>>esou >>>> >>>> >>>>>rceLocalizationService$LocalizerRunner.run(ResourceLocalizationService >>>>>.jav
+
Kihwal Lee 2012-10-30, 16:29
-
Re: division by zero in getLocalPathForWrite()
Ted Yu 2012-10-30, 16:33
Thanks for the investigation Kihwal. I will keep an eye on future test failure in TestRowCounter. On Tue, Oct 30, 2012 at 9:29 AM, Kihwal Lee <[EMAIL PROTECTED]> wrote: > Ted, > > I couldn't reproduce it by just running the test case. When you reproduce > it, look at the stderr/stdout file somewhere under > target/org.apache.hadoop.mapred.MiniMRCluster. Look for the one under the > directory whose name containing the app id. > > I did run into a similar problem and the stderr said: > /bin/bash: /bin/java: No such file or directory > > It was because JAVA_HOME was not set. But in this case the exit code was > 127 (shell not being able to locate the command to exec). In the hudson > job, the exit code was 1, so I think it's something else. > > Kihwal > > On 10/29/12 11:56 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: > > >TestRowCounter still fails: > > > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/j> >unit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColu > >mn/ > > > >but there was no 'divide by zero' exception. > > > >Cheers > > > >On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > >> I will try 2.0.2-alpha release. > >> > >> Cheers > >> > >> > >> On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> > >>> Thanks for the quick response, Robert. > >>> Here is the hadoop version being used: > >>> <hadoop-two.version>2.0.1-alpha</hadoop-two.version> > >>> > >>> If there is newer release, I am willing to try that before filing JIRA. > >>> > >>> > >>> On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans > >>><[EMAIL PROTECTED]>wrote: > >>> > >>>> It looks like you are running with an older version of 2.0, even > >>>>though > >>>> it > >>>> does not really make much of a difference in this case, The issue > >>>>shows > >>>> up when getLocalPathForWrite thinks there is no space on to write to > >>>>on > >>>> any of the disks it has configured. This could be because you do not > >>>> have > >>>> any directories configured. I really don't know for sure exactly > >>>>what is > >>>> happening. It might be disk fail in place removing disks for you > >>>>because > >>>> of other issues. Either way we should file a JIRA against Hadoop to > >>>>make > >>>> it so we never get the / by zero error and provide a better way to > >>>>handle > >>>> the possible causes. > >>>> > >>>> --Bobby Evans > >>>> > >>>> On 10/24/12 11:54 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: > >>>> > >>>> >Hi, > >>>> >HBase has Jenkins build against hadoop 2.0 > >>>> >I was checking why TestRowCounter sometimes failed: > >>>> > > >>>> > >>>> > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testRepor> >>>>t/o > >>>> > >>>> > >>>>>rg.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiv > >>>>>eCol > >>>> >umn/ > >>>> > > >>>> >I think the following could be the cause: > >>>> > > >>>> >2012-10-22 23:46:32,571 WARN [AsyncDispatcher event handler] > >>>> >resourcemanager.RMAuditLogger(255): USER=jenkins > >>>> OPERATION=Application > >>>> >Finished - Failed TARGET=RMAppManager RESULT=FAILURE > >>>> DESCRIPTION=App > >>>> >failed with state: FAILED PERMISSIONS=Application > >>>> >application_1350949562159_0002 failed 1 times due to AM Container for > >>>> >appattempt_1350949562159_0002_000001 exited with exitCode: -1000 due > >>>> >to: java.lang.ArithmeticException: / by zero > >>>> > at > >>>> > >>>> > >>>>>org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPat > >>>>>hFor > >>>> >Write(LocalDirAllocator.java:355) > >>>> > at > >>>> > >>>> > >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl > >>>>>loca > >>>> >tor.java:150) > >>>> > at > >>>> > >>>> > >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl > >>>>>loca > >>>> >tor.java:131) > >>>> > at > >>>> > >>>> > >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl > >>>
+
Ted Yu 2012-10-30, 16:33
-
Re: division by zero in getLocalPathForWrite()
Ted Yu 2013-01-13, 16:39
I found this error again, see https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/345/testReport/org.apache.hadoop.hbase.mapreduce/TestImportExport/testSimpleCase/2013-01-12 11:53:52,809 WARN [AsyncDispatcher event handler] resourcemanager.RMAuditLogger(255): USER=jenkins OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1357991604658_0002 failed 1 times due to AM Container for appattempt_1357991604658_0002_000001 exited with exitCode: -1000 due to: java.lang.ArithmeticException: / by zero at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:368) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:279) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:851) .Failing this attempt.. Failing the application. APPID=application_1357991604658_0002 Here is related code: // Keep rolling the wheel till we get a valid path Random r = new java.util.Random(); while (numDirsSearched < numDirs && returnPath == null) { long randomPosition = Math.abs(r.nextLong()) % totalAvailable; My guess is that totalAvailable was 0, meaning dirDF was empty. Please advise whether that scenario is possible. Cheers On Tue, Oct 30, 2012 at 9:33 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > Thanks for the investigation Kihwal. > > I will keep an eye on future test failure in TestRowCounter. > > > On Tue, Oct 30, 2012 at 9:29 AM, Kihwal Lee <[EMAIL PROTECTED]> wrote: > >> Ted, >> >> I couldn't reproduce it by just running the test case. When you reproduce >> it, look at the stderr/stdout file somewhere under >> target/org.apache.hadoop.mapred.MiniMRCluster. Look for the one under the >> directory whose name containing the app id. >> >> I did run into a similar problem and the stderr said: >> /bin/bash: /bin/java: No such file or directory >> >> It was because JAVA_HOME was not set. But in this case the exit code was >> 127 (shell not being able to locate the command to exec). In the hudson >> job, the exit code was 1, so I think it's something else. >> >> Kihwal >> >> On 10/29/12 11:56 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: >> >> >TestRowCounter still fails: >> > >> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/j>> >> >unit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColu >> >mn/ >> > >> >but there was no 'divide by zero' exception. >> > >> >Cheers >> > >> >On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <[EMAIL PROTECTED]> wrote: >> > >> >> I will try 2.0.2-alpha release. >> >> >> >> Cheers >> >> >> >> >> >> On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <[EMAIL PROTECTED]> wrote: >> >> >> >>> Thanks for the quick response, Robert. >> >>> Here is the hadoop version being used: >> >>> <hadoop-two.version>2.0.1-alpha</hadoop-two.version> >> >>> >> >>> If there is newer release, I am willing to try that before filing >> JIRA. >> >>> >> >>> >> >>> On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans >> >>><[EMAIL PROTECTED]>wrote: >> >>> >> >>>> It looks like you are running with an older version of 2.0, even >> >>>>though >> >>>> it >> >>>> does not really make much of a difference in this case, The issue >> >>>>shows >> >>>> up when getLocalPathForWrite thinks there is no space on to write to >> >>>>on >> >>>> any of the disks it has configured. This could be because you do not >> >>>> have >> >>>> any directories configured. I really don't know for sure exactly >> >>>>what is >
+
Ted Yu 2013-01-13, 16:39
-
Re: division by zero in getLocalPathForWrite()
Steve Loughran 2013-01-14, 12:34
It certainly looks possible -can you file a JIRA issue on the problem? On 13 January 2013 16:39, Ted Yu <[EMAIL PROTECTED]> wrote: > I found this error again, see > > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/345/testReport/org.apache.hadoop.hbase.mapreduce/TestImportExport/testSimpleCase/> > 2013-01-12 11:53:52,809 WARN [AsyncDispatcher event handler] > resourcemanager.RMAuditLogger(255): USER=jenkins > OPERATION=Application > Finished - Failed TARGET=RMAppManager RESULT=FAILURE > DESCRIPTION=App > failed with state: FAILED PERMISSIONS=Application > application_1357991604658_0002 failed 1 times due to AM Container for > appattempt_1357991604658_0002_000001 exited with exitCode: -1000 due > to: java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:368) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) > at > org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:279) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:851) > > .Failing this attempt.. Failing the > application. APPID=application_1357991604658_0002 > Here is related code: > > // Keep rolling the wheel till we get a valid path > Random r = new java.util.Random(); > while (numDirsSearched < numDirs && returnPath == null) { > long randomPosition = Math.abs(r.nextLong()) % totalAvailable; > > My guess is that totalAvailable was 0, meaning dirDF was empty. > > Please advise whether that scenario is possible. > > Cheers > > On Tue, Oct 30, 2012 at 9:33 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Thanks for the investigation Kihwal. > > > > I will keep an eye on future test failure in TestRowCounter. > > > > > > On Tue, Oct 30, 2012 at 9:29 AM, Kihwal Lee <[EMAIL PROTECTED]> > wrote: > > > >> Ted, > >> > >> I couldn't reproduce it by just running the test case. When you > reproduce > >> it, look at the stderr/stdout file somewhere under > >> target/org.apache.hadoop.mapred.MiniMRCluster. Look for the one under > the > >> directory whose name containing the app id. > >> > >> I did run into a similar problem and the stderr said: > >> /bin/bash: /bin/java: No such file or directory > >> > >> It was because JAVA_HOME was not set. But in this case the exit code was > >> 127 (shell not being able to locate the command to exec). In the hudson > >> job, the exit code was 1, so I think it's something else. > >> > >> Kihwal > >> > >> On 10/29/12 11:56 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: > >> > >> >TestRowCounter still fails: > >> > > >> > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/j> >> > >> > >unit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColu > >> >mn/ > >> > > >> >but there was no 'divide by zero' exception. > >> > > >> >Cheers > >> > > >> >On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> > > >> >> I will try 2.0.2-alpha release. > >> >> > >> >> Cheers > >> >> > >> >> > >> >> On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> >> > >> >>> Thanks for the quick response, Robert. > >> >>> Here is the hadoop version being used: > >> >>> <hadoop-two.version>2.0.1-alpha</hadoop-two.version> > >> >>> > >> >>> If there is newer release, I am willing to try that before filing > >> JIRA. > >> >>> > >> >>> > >> >>> On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans > >> >>><[EMAIL PROTECTED]>wrote: > >> >>> > >> >>>> It looks like you are running with an older version of 2.0, even
+
Steve Loughran 2013-01-14, 12:34
-
Re: division by zero in getLocalPathForWrite()
Ted Yu 2013-01-14, 15:01
MAPREDUCE-4940 has been logged. Thanks On Mon, Jan 14, 2013 at 4:34 AM, Steve Loughran <[EMAIL PROTECTED]>wrote: > It certainly looks possible -can you file a JIRA issue on the problem? > > On 13 January 2013 16:39, Ted Yu <[EMAIL PROTECTED]> wrote: > > > I found this error again, see > > > > > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/345/testReport/org.apache.hadoop.hbase.mapreduce/TestImportExport/testSimpleCase/> > > > 2013-01-12 11:53:52,809 WARN [AsyncDispatcher event handler] > > resourcemanager.RMAuditLogger(255): USER=jenkins > > OPERATION=Application > > Finished - Failed TARGET=RMAppManager RESULT=FAILURE > > DESCRIPTION=App > > failed with state: FAILED PERMISSIONS=Application > > application_1357991604658_0002 failed 1 times due to AM Container for > > appattempt_1357991604658_0002_000001 exited with exitCode: -1000 due > > to: java.lang.ArithmeticException: / by zero > > at > > > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:368) > > at > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) > > at > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) > > at > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:279) > > at > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:851) > > > > .Failing this attempt.. Failing the > > application. APPID=application_1357991604658_0002 > > Here is related code: > > > > // Keep rolling the wheel till we get a valid path > > Random r = new java.util.Random(); > > while (numDirsSearched < numDirs && returnPath == null) { > > long randomPosition = Math.abs(r.nextLong()) % totalAvailable; > > > > My guess is that totalAvailable was 0, meaning dirDF was empty. > > > > Please advise whether that scenario is possible. > > > > Cheers > > > > On Tue, Oct 30, 2012 at 9:33 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > Thanks for the investigation Kihwal. > > > > > > I will keep an eye on future test failure in TestRowCounter. > > > > > > > > > On Tue, Oct 30, 2012 at 9:29 AM, Kihwal Lee <[EMAIL PROTECTED]> > > wrote: > > > > > >> Ted, > > >> > > >> I couldn't reproduce it by just running the test case. When you > > reproduce > > >> it, look at the stderr/stdout file somewhere under > > >> target/org.apache.hadoop.mapred.MiniMRCluster. Look for the one under > > the > > >> directory whose name containing the app id. > > >> > > >> I did run into a similar problem and the stderr said: > > >> /bin/bash: /bin/java: No such file or directory > > >> > > >> It was because JAVA_HOME was not set. But in this case the exit code > was > > >> 127 (shell not being able to locate the command to exec). In the > hudson > > >> job, the exit code was 1, so I think it's something else. > > >> > > >> Kihwal > > >> > > >> On 10/29/12 11:56 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: > > >> > > >> >TestRowCounter still fails: > > >> > > > >> > > > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/j> > >> > > >> > > > >unit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColu > > >> >mn/ > > >> > > > >> >but there was no 'divide by zero' exception. > > >> > > > >> >Cheers > > >> > > > >> >On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > >> > > > >> >> I will try 2.0.2-alpha release. > > >> >> > > >> >> Cheers > > >> >> > > >> >> > > >> >> On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <[EMAIL PROTECTED]> > wrote: > > >> >> > > >> >>> Thanks for the quick response, Robert. > > >> >>> Here is the hadoop version being used:
+
Ted Yu 2013-01-14, 15:01
|
|