Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> executing files on hdfs via hadoop not possible? is JNI/JNA a reasonable solution?


Copy link to this message
-
Re: executing files on hdfs via hadoop not possible? is JNI/JNA a reasonable solution?
You're confusing two things here. HDFS is a data storage filesystem.
MR does not have anything to do with HDFS (generally speaking).

A reducer runs as a regular JVM on a provided node, and can execute
any program you'd like it to by downloading it onto its configured
local filesystem and executing it.

If your goal is merely to run a regular program over data that is
sitting in HDFS, that can be achieved. If your library is in C then
simply use a streaming program to run it and use libhdfs' HDFS API
(C/C++) to read data into your functions from HDFS files. Would this
not suffice?

On Sun, Mar 17, 2013 at 3:09 PM, Julian Bui <[EMAIL PROTECTED]> wrote:
> Hi hadoop users,
>
> I just want to verify that there is no way to put a binary on HDFS and
> execute it using the hadoop java api.  If not, I would appreciate advice in
> getting in creating an implementation that uses native libraries.
>
> "In contrast to the POSIX model, there are no sticky, setuid or setgid bits
> for files as there is no notion of executable files."  Is there no
> workaround?
>
> A little bit more about what I'm trying to do.  I have a binary that
> converts my image to another image format.  I currently want to put it in
> the distributed cache and tell the reducer to execute the binary on the data
> on hdfs.  However, since I can't set the execute permission bit on that
> file, it seems that I cannot do that.
>
> Since I cannot use the binary, it seems like I have to use my own
> implementation to do this.  The challenge is that these libraries that I can
> use to do this are .a and .so files.  Would I have to use JNI and package
> the libraries in the distributed cache and then have the reducer find and
> use those libraries on the task nodes?  Actually, I wouldn't want to use
> JNI, I'd probably want to use java native access (JNA) to do this.  Has
> anyone used JNA with hadoop and been successful?  Are there problems I'll
> encounter?
>
> Please let me know.
>
> Thanks,
> -Julian

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB