Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - executing files on hdfs via hadoop not possible? is JNI/JNA a reasonable solution?


+
Julian Bui 2013-03-17, 09:39
Copy link to this message
-
Re: executing files on hdfs via hadoop not possible? is JNI/JNA a reasonable solution?
Harsh J 2013-03-17, 09:50
You're confusing two things here. HDFS is a data storage filesystem.
MR does not have anything to do with HDFS (generally speaking).

A reducer runs as a regular JVM on a provided node, and can execute
any program you'd like it to by downloading it onto its configured
local filesystem and executing it.

If your goal is merely to run a regular program over data that is
sitting in HDFS, that can be achieved. If your library is in C then
simply use a streaming program to run it and use libhdfs' HDFS API
(C/C++) to read data into your functions from HDFS files. Would this
not suffice?

On Sun, Mar 17, 2013 at 3:09 PM, Julian Bui <[EMAIL PROTECTED]> wrote:
> Hi hadoop users,
>
> I just want to verify that there is no way to put a binary on HDFS and
> execute it using the hadoop java api.  If not, I would appreciate advice in
> getting in creating an implementation that uses native libraries.
>
> "In contrast to the POSIX model, there are no sticky, setuid or setgid bits
> for files as there is no notion of executable files."  Is there no
> workaround?
>
> A little bit more about what I'm trying to do.  I have a binary that
> converts my image to another image format.  I currently want to put it in
> the distributed cache and tell the reducer to execute the binary on the data
> on hdfs.  However, since I can't set the execute permission bit on that
> file, it seems that I cannot do that.
>
> Since I cannot use the binary, it seems like I have to use my own
> implementation to do this.  The challenge is that these libraries that I can
> use to do this are .a and .so files.  Would I have to use JNI and package
> the libraries in the distributed cache and then have the reducer find and
> use those libraries on the task nodes?  Actually, I wouldn't want to use
> JNI, I'd probably want to use java native access (JNA) to do this.  Has
> anyone used JNA with hadoop and been successful?  Are there problems I'll
> encounter?
>
> Please let me know.
>
> Thanks,
> -Julian

--
Harsh J
+
Julian Bui 2013-03-17, 10:50
+
Harsh J 2013-03-17, 13:28