Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # dev - division by zero in getLocalPathForWrite()


Copy link to this message
-
Re: division by zero in getLocalPathForWrite()
Ted Yu 2013-01-13, 16:39
I found this error again, see
https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/345/testReport/org.apache.hadoop.hbase.mapreduce/TestImportExport/testSimpleCase/

2013-01-12 11:53:52,809 WARN  [AsyncDispatcher event handler]
resourcemanager.RMAuditLogger(255): USER=jenkins OPERATION=Application
Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App
failed with state: FAILED PERMISSIONS=Application
application_1357991604658_0002 failed 1 times due to AM Container for
appattempt_1357991604658_0002_000001 exited with  exitCode: -1000 due
to: java.lang.ArithmeticException: / by zero
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:368)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:279)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:851)

.Failing this attempt.. Failing the
application. APPID=application_1357991604658_0002
Here is related code:

        // Keep rolling the wheel till we get a valid path
        Random r = new java.util.Random();
        while (numDirsSearched < numDirs && returnPath == null) {
          long randomPosition = Math.abs(r.nextLong()) % totalAvailable;

My guess is that totalAvailable was 0, meaning dirDF was empty.

Please advise whether that scenario is possible.

Cheers

On Tue, Oct 30, 2012 at 9:33 AM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Thanks for the investigation Kihwal.
>
> I will keep an eye on future test failure in TestRowCounter.
>
>
> On Tue, Oct 30, 2012 at 9:29 AM, Kihwal Lee <[EMAIL PROTECTED]> wrote:
>
>> Ted,
>>
>> I couldn't reproduce it by just running the test case. When you reproduce
>> it, look at the stderr/stdout file somewhere under
>> target/org.apache.hadoop.mapred.MiniMRCluster. Look for the one under the
>> directory whose name containing the app id.
>>
>> I did run into a similar problem and the stderr said:
>> /bin/bash: /bin/java: No such file or directory
>>
>> It was because JAVA_HOME was not set. But in this case the exit code was
>> 127 (shell not being able to locate the command to exec). In the hudson
>> job, the exit code was 1, so I think it's something else.
>>
>> Kihwal
>>
>> On 10/29/12 11:56 PM, "Ted Yu" <[EMAIL PROTECTED]> wrote:
>>
>> >TestRowCounter still fails:
>> >
>> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/j
>>
>> >unit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColu
>> >mn/
>> >
>> >but there was no 'divide by zero' exception.
>> >
>> >Cheers
>> >
>> >On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> >
>> >> I will try 2.0.2-alpha release.
>> >>
>> >> Cheers
>> >>
>> >>
>> >> On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> >>
>> >>> Thanks for the quick response, Robert.
>> >>> Here is the hadoop version being used:
>> >>>     <hadoop-two.version>2.0.1-alpha</hadoop-two.version>
>> >>>
>> >>> If there is newer release, I am willing to try that before filing
>> JIRA.
>> >>>
>> >>>
>> >>> On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans
>> >>><[EMAIL PROTECTED]>wrote:
>> >>>
>> >>>> It looks like you are running with an older version of 2.0, even
>> >>>>though
>> >>>> it
>> >>>> does not really make much of a difference in this case,  The issue
>> >>>>shows
>> >>>> up when getLocalPathForWrite thinks there is no space on to write to
>> >>>>on
>> >>>> any of the disks it has configured.  This could be because you do not
>> >>>> have
>> >>>> any directories configured.  I really don't know for sure exactly
>> >>>>what is
>