|
|
Xiaomeng Wan 2011-01-24, 23:54
Hi, I want to write a python udf to split string into bags
------------------------------------------------------------ #!/usr/bin/python
import re @outputSchema("y:bag{t:tuple(word:chararray)}") def strsplittobag(content,regex): return re.compile(regex).split(content) ------------------------------------------------------------
it gave an error saying "could not instantiate 'org.apache.pig.scripting.jython.JythonFunction' with arguments '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python udfs working, so shouldn't be configuration problem. I am new to python, did I miss anything?
Thanks!
Shawn
-
Re: python udf doesnt work
Daniel Dai 2011-01-25, 00:28
Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there, do ant first). This is a bug we need to fix.
Daniel
Xiaomeng Wan wrote: > Hi, > I want to write a python udf to split string into bags > > ------------------------------------------------------------ > #!/usr/bin/python > > import re > @outputSchema("y:bag{t:tuple(word:chararray)}") > def strsplittobag(content,regex): > return re.compile(regex).split(content) > ------------------------------------------------------------ > > it gave an error saying "could not instantiate > 'org.apache.pig.scripting.jython.JythonFunction' with arguments > '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python > udfs working, so shouldn't be configuration problem. I am new to > python, did I miss anything? > > Thanks! > > Shawn >
-
Re: python udf doesnt work
Xiaomeng Wan 2011-01-25, 20:50
Hi Daniel,
I did put jython.jar in classpath. By comparing other python udfs with this one, I find those udfs which work do not import anything. Could that be the cause? Do I need to anything extra to import module in my udf?
Thanks!
Shawn
On Mon, Jan 24, 2011 at 5:28 PM, Daniel Dai <[EMAIL PROTECTED]> wrote: > Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there, do > ant first). This is a bug we need to fix. > > Daniel > > Xiaomeng Wan wrote: >> >> Hi, >> I want to write a python udf to split string into bags >> >> ------------------------------------------------------------ >> #!/usr/bin/python >> >> import re >> @outputSchema("y:bag{t:tuple(word:chararray)}") >> def strsplittobag(content,regex): >> return re.compile(regex).split(content) >> ------------------------------------------------------------ >> >> it gave an error saying "could not instantiate >> 'org.apache.pig.scripting.jython.JythonFunction' with arguments >> '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python >> udfs working, so shouldn't be configuration problem. I am new to >> python, did I miss anything? >> >> Thanks! >> >> Shawn >> > >
-
Re: python udf doesnt work
Richard Ding 2011-01-26, 01:46
You're right. There're two issues here. First, the Jython script needs to locate the modules in its search path (e.g. python.path). If you have the right env variable set, Jython script should be able to find and import the module. Second, Pig currently doesn't automatically ship the module file to the backend, so even if you set the search path in the frontend, the backend still cannot locate the module.
Finally, there is incompatibility between Python modules and Jython modules. You need to use Jython modules that come with Jython installation (in the Lib directory).
We're looking into these issues and hoping to provide a solution in the next release.
Thanks, -Richard On 1/25/11 12:50 PM, "Xiaomeng Wan" <[EMAIL PROTECTED]> wrote:
Hi Daniel,
I did put jython.jar in classpath. By comparing other python udfs with this one, I find those udfs which work do not import anything. Could that be the cause? Do I need to anything extra to import module in my udf?
Thanks!
Shawn
On Mon, Jan 24, 2011 at 5:28 PM, Daniel Dai <[EMAIL PROTECTED]> wrote: > Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there, do > ant first). This is a bug we need to fix. > > Daniel > > Xiaomeng Wan wrote: >> >> Hi, >> I want to write a python udf to split string into bags >> >> ------------------------------------------------------------ >> #!/usr/bin/python >> >> import re >> @outputSchema("y:bag{t:tuple(word:chararray)}") >> def strsplittobag(content,regex): >> return re.compile(regex).split(content) >> ------------------------------------------------------------ >> >> it gave an error saying "could not instantiate >> 'org.apache.pig.scripting.jython.JythonFunction' with arguments >> '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python >> udfs working, so shouldn't be configuration problem. I am new to >> python, did I miss anything? >> >> Thanks! >> >> Shawn >> > >
-
Re: python udf doesnt work
Julien Le Dem 2011-01-26, 18:01
As a workaround, in Jython you can also use the java classes. Something like: (not tested)
from java.util.regex import * from java.lang import *
@outputSchema("y:bag{t:tuple(word:chararray)}") def strsplittobag(content,regex): return Pattern.compile(regex).split(content)
Julien
On 1/25/11 5:46 PM, "Richard Ding" <[EMAIL PROTECTED]> wrote:
You're right. There're two issues here. First, the Jython script needs to locate the modules in its search path (e.g. python.path). If you have the right env variable set, Jython script should be able to find and import the module. Second, Pig currently doesn't automatically ship the module file to the backend, so even if you set the search path in the frontend, the backend still cannot locate the module.
Finally, there is incompatibility between Python modules and Jython modules. You need to use Jython modules that come with Jython installation (in the Lib directory).
We're looking into these issues and hoping to provide a solution in the next release.
Thanks, -Richard On 1/25/11 12:50 PM, "Xiaomeng Wan" <[EMAIL PROTECTED]> wrote:
Hi Daniel,
I did put jython.jar in classpath. By comparing other python udfs with this one, I find those udfs which work do not import anything. Could that be the cause? Do I need to anything extra to import module in my udf?
Thanks!
Shawn
On Mon, Jan 24, 2011 at 5:28 PM, Daniel Dai <[EMAIL PROTECTED]> wrote: > Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there, do > ant first). This is a bug we need to fix. > > Daniel > > Xiaomeng Wan wrote: >> >> Hi, >> I want to write a python udf to split string into bags >> >> ------------------------------------------------------------ >> #!/usr/bin/python >> >> import re >> @outputSchema("y:bag{t:tuple(word:chararray)}") >> def strsplittobag(content,regex): >> return re.compile(regex).split(content) >> ------------------------------------------------------------ >> >> it gave an error saying "could not instantiate >> 'org.apache.pig.scripting.jython.JythonFunction' with arguments >> '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python >> udfs working, so shouldn't be configuration problem. I am new to >> python, did I miss anything? >> >> Thanks! >> >> Shawn >> > >
-
Re: python udf doesnt work
Xiaomeng Wan 2011-01-26, 21:41
It works! Only need to explicitly cast the results into bag of tuples. from java.util.regex import * from java.lang import *
@outputSchema("y:bag{t:tuple(word:chararray)}") def strsplittobag(content,regex): toks = Pattern.compile(regex).split(content) outBag = [] for tok in toks: tup = tok, outBag.append(tup) return outBag
Thank you all!
Shawn
On Wed, Jan 26, 2011 at 11:01 AM, Julien Le Dem <[EMAIL PROTECTED]> wrote: > As a workaround, in Jython you can also use the java classes. > Something like: (not tested) > > from java.util.regex import * > from java.lang import * > > @outputSchema("y:bag{t:tuple(word:chararray)}") > def strsplittobag(content,regex): > return Pattern.compile(regex).split(content) > > Julien > > On 1/25/11 5:46 PM, "Richard Ding" <[EMAIL PROTECTED]> wrote: > > You're right. There're two issues here. First, the Jython script needs to > locate the modules in its search path (e.g. python.path). If you have the > right env variable set, Jython script should be able to find and import the > module. Second, Pig currently doesn't automatically ship the module file to > the backend, so even if you set the search path in the frontend, the backend > still cannot locate the module. > > Finally, there is incompatibility between Python modules and Jython modules. > You need to use Jython modules that come with Jython installation (in the > Lib directory). > > We're looking into these issues and hoping to provide a solution in the next > release. > > Thanks, > -Richard > > > On 1/25/11 12:50 PM, "Xiaomeng Wan" <[EMAIL PROTECTED]> wrote: > > Hi Daniel, > > I did put jython.jar in classpath. By comparing other python udfs with > this one, I find those udfs which work do not import anything. Could > that be the cause? Do I need to anything extra to import module in my > udf? > > Thanks! > > Shawn > > On Mon, Jan 24, 2011 at 5:28 PM, Daniel Dai <[EMAIL PROTECTED]> wrote: >> Put build/ivy/lib/Pig/jython-2.5.0.jar in your classpath (if not there, do >> ant first). This is a bug we need to fix. >> >> Daniel >> >> Xiaomeng Wan wrote: >>> >>> Hi, >>> I want to write a python udf to split string into bags >>> >>> ------------------------------------------------------------ >>> #!/usr/bin/python >>> >>> import re >>> @outputSchema("y:bag{t:tuple(word:chararray)}") >>> def strsplittobag(content,regex): >>> return re.compile(regex).split(content) >>> ------------------------------------------------------------ >>> >>> it gave an error saying "could not instantiate >>> 'org.apache.pig.scripting.jython.JythonFunction' with arguments >>> '[/home/.../mypyudfs.py, strsplittobag]'". I had some other python >>> udfs working, so shouldn't be configuration problem. I am new to >>> python, did I miss anything? >>> >>> Thanks! >>> >>> Shawn >>> >> >> > > >
|
|