|
|
-
Re: division by zero in getLocalPathForWrite()Ted Yu 2013-01-13, 16:39
I found this error again, see
https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/345/testReport/org.apache.hadoop.hbase.mapreduce/TestImportExport/testSimpleCase/ 2013-01-12 11:53:52,809 WARN [AsyncDispatcher event handler] resourcemanager.RMAuditLogger(255): USER=jenkins OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1357991604658_0002 failed 1 times due to AM Container for appattempt_1357991604658_0002_000001 exited with exitCode: -1000 due to: java.lang.ArithmeticException: / by zero at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:368) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:279) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:851) .Failing this attempt.. Failing the application. APPID=application_1357991604658_0002 Here is related code: // Keep rolling the wheel till we get a valid path Random r = new java.util.Random(); while (numDirsSearched < numDirs && returnPath == null) { long randomPosition = Math.abs(r.nextLong()) % totalAvailable; My guess is that totalAvailable was 0, meaning dirDF was empty. Please advise whether that scenario is possible. Cheers On Tue, Oct 30, 2012 at 9:33 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > Thanks for the investigation Kihwal. > > I will keep an eye on future test failure in TestRowCounter. > > > On Tue, Oct 30, 2012 at 9:29 AM, Kihwal Lee <[EMAIL PROTECTED]> wrote: > >> Ted, >> >> I couldn't reproduce it by just running the test case. When you reproduce >> it, look at the stderr/stdout file somewhere under >> target/org.apache.hadoop.mapred.MiniMRCluster. Look for the one under the >> directory whose name containing the app id. >> >> I did run into a similar problem and the stderr said: >> /bin/bash: /bin/java: No such file or directory >> >> It was because JAVA_HOME was not set. But in this case the exit code was >> 127 (shell not being able to locate the command to exec). In the hudson >> job, the exit code was 1, so I think it's something else. >> >> Kihwal >> >> On 10/29/12 11:56 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote: >> >> >TestRowCounter still fails: >> > >> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/j >> >> >unit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColu >> >mn/ >> > >> >but there was no 'divide by zero' exception. >> > >> >Cheers >> > >> >On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <[EMAIL PROTECTED]> wrote: >> > >> >> I will try 2.0.2-alpha release. >> >> >> >> Cheers >> >> >> >> >> >> On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <[EMAIL PROTECTED]> wrote: >> >> >> >>> Thanks for the quick response, Robert. >> >>> Here is the hadoop version being used: >> >>> <hadoop-two.version>2.0.1-alpha</hadoop-two.version> >> >>> >> >>> If there is newer release, I am willing to try that before filing >> JIRA. >> >>> >> >>> >> >>> On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans >> >>><[EMAIL PROTECTED]>wrote: >> >>> >> >>>> It looks like you are running with an older version of 2.0, even >> >>>>though >> >>>> it >> >>>> does not really make much of a difference in this case, The issue >> >>>>shows >> >>>> up when getLocalPathForWrite thinks there is no space on to write to >> >>>>on >> >>>> any of the disks it has configured. This could be because you do not >> >>>> have >> >>>> any directories configured. I really don't know for sure exactly >> >>>>what is > |