Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Problem running a Hadoop program with external libraries


Copy link to this message
-
Re: Problem running a Hadoop program with external libraries
I don't know if putting native-code .so files inside a jar works. A
native-code .so is not "classloaded" in the same way .class files are.

So the correct .so files probably need to exist in some physical directory
on the worker machines. You may want to doublecheck that the correct
directory on the worker machines is identified in the JVM property
'java.library.path' (instead of / in addition to $LD_LIBRARY_PATH). This can
be manipulated in the Hadoop configuration setting mapred.child.java.opts
(include '-Djava.library.path=/path/to/native/libs' in the string there.)

Also, if you added your .so files to a directory that is already used by the
tasktracker (like hadoop-0.21.0/lib/native/Linux-amd64-64/), you may need to
restart the tasktracker instance for it to take effect. (This is true of
.jar files in the $HADOOP_HOME/lib directory; I don't know if it is true for
native libs as well.)

- Aaron
On Fri, Mar 4, 2011 at 12:53 PM, Ratner, Alan S (IS) <[EMAIL PROTECTED]>wrote:

> We are having difficulties running a Hadoop program making calls to
> external libraries - but this occurs only when we run the program on our
> cluster and not from within Eclipse where we are apparently running in
> Hadoop's standalone mode.  This program invokes the Open Computer Vision
> libraries (OpenCV and JavaCV).  (I don't think there is a problem with our
> cluster - we've run many Hadoop jobs on it without difficulty.)
>
> 1.      I normally use Eclipse to create jar files for our Hadoop programs
> but I inadvertently hit the "run as Java application" button and the program
> ran fine, reading the input file from the eclipse workspace rather than HDFS
> and writing the output file to the same place.  Hadoop's output appears
> below.  (This occurred on the master Hadoop server.)
>
> 2.      I then "exported" from Eclipse a "runnable jar" which "extracted
> required libraries" into the generated jar - presumably producing a jar file
> that incorporated all the required library functions. (The plain jar file
> for this program is 17 kB while the runnable jar is 30MB.)  When I try to
> run this on my Hadoop cluster (including my master and slave servers) the
> program reports that it is unable to locate "libopencv_highgui.so.2.2:
> cannot open shared object file: No such file or directory".  Now, in
> addition to this library being incorporated inside the runnable jar file it
> is also present on each of my servers at
> hadoop-0.21.0/lib/native/Linux-amd64-64/ where we have loaded the same
> libraries (to give Hadoop 2 shots at finding them).  These include:
>      ...
>      libopencv_highgui_pch_dephelp.a
>      libopencv_highgui.so
>      libopencv_highgui.so.2.2
>      libopencv_highgui.so.2.2.0
>      ...
>
>      When I poke around inside the runnable jar I find
> javacv_linux-x86_64.jar which contains:
>      com/googlecode/javacv/cpp/linux-x86_64/libjniopencv_highgui.so
>
> 3.      I then tried adding the following to mapred-site.xml as suggested
> in Patch 2838 that's supposed to be included in hadoop 0.21
> https://issues.apache.org/jira/browse/HADOOP-2838
>      <property>
>        <name>mapred.child.env</name>
>
>  <value>LD_LIBRARY_PATH=/home/ngc/hadoop-0.21.0/lib/native/Linux-amd64-64</value>
>      </property>
>      The log is included at the bottom of this email with Hadoop now
> complaining about a different missing library with an out-of-memory error.
>
> Does anyone have any ideas as to what is going wrong here?  Any help would
> be appreciated.  Thanks.
>
> Alan
>
>
> BTW: Each of our servers has 4 hard drives and many of the errors below
> refer to the 3 drives (/media/hd2 or hd3 or hd4) reserved exclusively for
> HDFS and thus perhaps not a good place for Hadoop to be looking for a
> library file.  My slaves have 24 GB RAM, the jar file is 30 MB, and the
> sequence file being read is 400 KB - so I hope I am not running out of
> memory.
>
>
> 1.      RUNNING DIRECTLY FROM ECLIPSE IN HADOOP'S STANDALONE MODE - SUCCESS
>