Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Running e2e RubyUDFs test in MR mode


Copy link to this message
-
Running e2e RubyUDFs test in MR mode
Cheolsoo Park 2012-06-08, 00:31
Hello,

I checked out branch-0.10, and I am trying to run e2e RubyUDFs tests in MR
mode. But I am getting the following error:

java.lang.IllegalStateException: *Could not initialize interpreter (from
> file system or classpath) with
> /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> *
>         at
> org.apache.pig.scripting.ScriptEngine.getScriptAsStream(ScriptEngine.java:145)
>         at
> org.apache.pig.scripting.jruby.JrubyScriptEngine$RubyFunctions.getFromCache(JrubyScriptEngine.java:104)
>         at
> org.apache.pig.scripting.jruby.JrubyScriptEngine$RubyFunctions.getFunctions(JrubyScriptEngine.java:120)
>         at
> org.apache.pig.scripting.jruby.JrubyEvalFunc.initialize(JrubyEvalFunc.java:87)
>         at
> org.apache.pig.scripting.jruby.JrubyEvalFunc.exec(JrubyEvalFunc.java:103)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:328)
Looking at the source code (ScriptEngine.java), I found
that scriptingudfs.rb should be found via classpath:

        if (file.exists()) {
>             try {
>                 is = new FileInputStream(file);
>             } catch (FileNotFoundException e) {
>                 throw new IllegalStateException("could not find existing
> file "+scriptPath, e);
>             }
>         } else {
>             if (file.isAbsolute()) {
>                 *is = ScriptEngine.class.getResourceAsStream(scriptPath);*
>             } else {
>                 is = ScriptEngine.class.getResourceAsStream("/" +
> scriptPath);
>             }
>         }
Now I looked at the Job jar generated by Pig and found that
scriptingudfs.rb indeed exists in that jar:

 cheolsoo@localhost:~/workspace/pig-cheolsoo $jar tvf
> Job9203441412304345930.jar | grep scriptingudfs.rb
>   2491 Thu Jun 07 14:42:44 PDT 2012 *
> /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/scriptingudfs.rb*
Since scriptingudfs.rb is inside the Job jar, I imagine that
getResourceAsStream() should be able to find it, but apparently it doesn't.

I am wondering if anyone was able to run these test in MR mode and could
provide some pointers to me. Any help would be appreciated!

Thanks,
Cheolsoo

p.s. The test works fine in local mode, which is not surprising
since scriptingudfs.rb would be found via file system. I also see a similar
issue with e2e Jython tests where Jython scripts are not found with
following error:

2012-06-05 22:44:19,491 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2012-06-05 22:44:19,513 [main] ERROR
> org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate
> exception from backed error: java.io.IOException: Deserialization error:
> could not instantiate 'org.apache.pig.scripting.jython.JythonFunction' with
> arguments
> '[/home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/python/scriptingudf.py,
> square]'
>