Hi hadoop users,
Trying to figure out which interface would be best and easiest to implement
my application : 1) hadoop pipes or 2) java with jni or 3) something else
that I'm not aware of yet, as a hadoop newbie.
I will use hadoop to take pictures as input and create output jpeg pictures
as output. I do not think I need a reducer.
- I want to use libjpeg.a (a native static library) in my hadoop
application. If I use hadoop pipes, I should be able to statically link
the libjpeg.a into my hadoop application. If I use the java hadoop
interface with jni, I think I have to ship the libjpeg.a library with my
hadoop jobs, is that right? Is that easy?
- I need to be able to write uniquely named files into hdfs (i.e. I need
to name the files so that I know which inputs they were created from). If
I recall, the hadoop streaming interface doesn't let you do this because it
only deals with stdin/stdout - does hadoop pipes have a similar constraint?
Will it allow me to write uniquely named files?
- I need to be able to exploit the locality of the data. The
application should be executed on the same machine as the input data
(pictures). Does the hadoop pipes interface allow me to do this?
- When I tried to learn more about the hadoop pipes API, all I could
find is this one submitter class. Is this really it or is there more?
- I'm not really familiar with swig which is to be used with pipes.
All I could really find was the same simple word count example on every
site. Does swig get difficult to use for more complex projects?