Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # general >> JVM is crashing for systems with NFS


+
Mihail Ionescu 2012-12-08, 17:01
+
Ted Dunning 2012-12-08, 18:00
+
Mihail Ionescu 2012-12-10, 11:09
Copy link to this message
-
Re: JVM is crashing for systems with NFS
Hi,

I have experienced this with non-hadoop applications although we
typically saw the SIGBUS with Java 5. When it received the signal the
JVM was trying to load a class from a jar on the NFS volume. In our
experience with whatever version of Java 6 we were using, the JVM
would throw ClassNotFoundExceptions as opposed to exiting due to a
SIGBUS. We were using SLES 10 or 11. At the end of the day, we decided
not keep jar files on NFS.

Brock

On Sat, Dec 8, 2012 at 11:01 AM, Mihail Ionescu <[EMAIL PROTECTED]> wrote:
> I have a small cluster of 15 machines, running hadoop-1-0-2. Each machine
> runs kernel 2.6.35, has a root disk mounted under nfs (all machines have
> the same root file system) and  a local disk (mounted under
> /mnt/localdisk). I installed hadoop under /mnt/localdisk/hadoop, which the
> conf directory shared for all machines (in order to change the
> configuration for all machines in an easy manner). I am using jdk 1.6.23,
> installed locally on /mnt/localdisk/jdk. On each machine a datanode and a
> tasktracker are running, each task tracker has 2 slots for mapper and 2
> slots for reducer.
>
> The problem is that, after running various map-reduce tasks, JVM crashes
> pretty frequently on many machines. There is no rule I could find,
> sometimes the datanode is crashing, other times tasktracker, or maybe even
> both. They generate a hs_err file, with SIGBUS 0x7, if needed I can post
> the contents of that file, I could not find anything interesting there.
>
> Does anyone had this problem? Maybe because the root file system is shared
> and hadoop tries to writes some files in /tmp or something and because the
> file system is shared across all machines? Any help would be greatly
> appreciated.
>
> Thanks,
>
> Mihail

--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/