Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> How to import custom Python module in MapReduce job?


Copy link to this message
-
How to import custom Python module in MapReduce job?
(cross-posted from
StackOverflow<http://stackoverflow.com/questions/18150208/how-to-import-custom-module-in-mapreduce-job?noredirect=1#comment26584564_18150208>
)

I have a MapReduce job defined in file *main.py*, which imports module lib from
file *lib.py*. I use Hadoop Streaming to submit this job to Hadoop cluster
as follows:

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar

    -files lib.py,main.py
    -mapper "./main.py map" -reducer "./main.py reduce"
    -input input -output output

In my understanding, this should put both main.py and lib.py into *distributed
cache folder* on each computing machine and thus make module lib available
to main. But it doesn't happen - from log file I see, that files *are
really copied* to the same directory, but main can't import lib, throwing*
ImportError*.

Adding current directory to the path didn't work:

import sys
sys.path.append(os.path.realpath(__file__))import lib# ImportError

though, loading module manually did the trick:

import imp
lib = imp.load_source('lib', 'lib.py')

But that's not what I want. So why Python interpreter can see other .py files
in the same directory, but can't import them? Note, I have already tried
adding empty __init__.py file to the same directory without effect.
+
Binglin Chang 2013-08-12, 08:33
+
Binglin Chang 2013-08-12, 10:12
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB