|
Jonathan Coveney
2010-12-27, 21:30
Aniket Mokashi
2010-12-28, 02:33
Jonathan Coveney
2010-12-28, 03:14
Jonathan Coveney
2010-12-28, 03:18
soren@...)
2010-12-29, 17:52
Jonathan Coveney
2010-12-29, 17:57
soren@...)
2010-12-29, 18:35
Jonathan Coveney
2010-12-29, 18:57
Jonathan Coveney
2010-12-29, 19:32
Dmitriy Ryaboy
2010-12-29, 21:53
Jonathan Coveney
2010-12-29, 22:00
Jonathan Coveney
2010-12-29, 22:01
soren@...)
2010-12-29, 22:50
Jonathan Coveney
2010-12-29, 23:09
Jonathan Coveney
2010-12-29, 23:12
Dmitriy Ryaboy
2010-12-30, 00:37
Jonathan Coveney
2010-12-30, 02:11
|
-
Using a UDF written in PythonJonathan Coveney 2010-12-27, 21:30
so I have module.py, and I want to be able to use it in a pig script. It has
no special imports or anything. I do have @outputSchemaFunction("output:chararray) In my pig script, I have this register '/my/udf/location/udf.py' using jython as myfunc; is there any reason why this wouldn't work? here is the error I get: 2010-12-27 16:29:41,288 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/python/util/PythonInterpreter Not the most instructive error, but is there anything more I need to be doing to be able to use a python UDF? As an aside, are simply python UDF's as efficient as Java ones? I like Python a lot and love the idea of being able to UDF in it, but can use java if necessary. +
Jonathan Coveney 2010-12-27, 21:30
-
Re: Using a UDF written in PythonAniket Mokashi 2010-12-28, 02:33
I think decorator used here is incorrect.
In general, "output:chararray" needs to be schema-string-compatible. Also, you are using "outputSchemaFunction", which is used in case you want to write a udf that has output schema dependent on input schema (�g -square) and this should have a function with decorator "schemaFunction" (named "output" in your case). I think using "outputSchema" decorator would fix the problem here. More details can be found at- http://wiki.apache.org/pig/UDFsUsingScriptingLanguages Thanks, Aniket On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote: > so I have module.py, and I want to be able to use it in a pig script. It > has no special imports or anything. I do have > @outputSchemaFunction("output:chararray) > > > In my pig script, I have this > > > register '/my/udf/location/udf.py' using jython as myfunc; > > is there any reason why this wouldn't work? here is the error I get: > > 2010-12-27 16:29:41,288 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 2998: Unhandled internal error. org/python/util/PythonInterpreter > > > Not the most instructive error, but is there anything more I need to be > doing to be able to use a python UDF? > > As an aside, are simply python UDF's as efficient as Java ones? I like > Python a lot and love the idea of being able to UDF in it, but can use > java if necessary. > +
Aniket Mokashi 2010-12-28, 02:33
-
Re: Using a UDF written in PythonJonathan Coveney 2010-12-28, 03:14
Aniket, I appreciate you taking a look at this. In general, I found the
documentation around outputSchema pretty confusing... for example, in this example @outputSchema("x:{t:(word:chararray)}") def helloworld(): return ('Hello, World') Then, in the sample script below that, you have @outputSchema("t:(numformat:chararray)") def commaFormat(num): return '{:,}'.format(num) In this case, you have lost the x:{} (which makes more sense to me. Perhaps this is because the latter function is meant to operate on an input and return a type (t), whereas the hello world function should be able to stand alone, and thus, has to return a bag? Not sure... Besides that, though, I changed my code per your suggestion and tried @outputSchema("t:(word:chararray)") and still got the error. As a note, do I need to import anything in the python script for outputSchema to work, or should it be fine since pig is grabbing it? Once again, I really appreciate your help in the matter. I feel having people who weren't intimately related to the project have a go at it is how you make it ultimately more usable and useful...but you have to answer some annoying questions on the way :P Thanks again. 2010/12/28 Aniket Mokashi <[EMAIL PROTECTED]> > I think decorator used here is incorrect. > In general, "output:chararray" needs to be schema-string-compatible. Also, > you are using "outputSchemaFunction", which is used in case you want to > write a udf that has output schema dependent on input schema (ęg -square) > and this should have a function with decorator "schemaFunction" (named > "output" in your case). I think using "outputSchema" decorator would fix > the problem here. > > More details can be found at- > http://wiki.apache.org/pig/UDFsUsingScriptingLanguages > > Thanks, > Aniket > > On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote: > > so I have module.py, and I want to be able to use it in a pig script. It > > has no special imports or anything. I do have > > @outputSchemaFunction("output:chararray) > > > > > > In my pig script, I have this > > > > > > register '/my/udf/location/udf.py' using jython as myfunc; > > > > is there any reason why this wouldn't work? here is the error I get: > > > > 2010-12-27 16:29:41,288 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 2998: Unhandled internal error. org/python/util/PythonInterpreter > > > > > > Not the most instructive error, but is there anything more I need to be > > doing to be able to use a python UDF? > > > > As an aside, are simply python UDF's as efficient as Java ones? I like > > Python a lot and love the idea of being able to UDF in it, but can use > > java if necessary. > > > > > +
Jonathan Coveney 2010-12-28, 03:14
-
Re: Using a UDF written in PythonJonathan Coveney 2010-12-28, 03:18
Oh and just to be sure, I have tried
@outputSchema("word:chararray") @outputSchema("x:{t:(word:chararray)}") as well (the former of which seems to be the "right" one, whenever I can figure out what is wrong) I've tested my code separately in python and it is fine... 2010/12/28 Jonathan Coveney <[EMAIL PROTECTED]> > Aniket, I appreciate you taking a look at this. In general, I found the > documentation around outputSchema pretty confusing... for example, in this > example > > @outputSchema("x:{t:(word:chararray)}") > def helloworld(): > return ('Hello, World') > > > Then, in the sample script below that, you have > > @outputSchema("t:(numformat:chararray)") > def commaFormat(num): > return '{:,}'.format(num) > > In this case, you have lost the x:{} (which makes more sense to me. > > Perhaps this is because the latter function is meant to operate on an input > and return a type (t), whereas the hello world function should be able to > stand alone, and thus, has to return a bag? Not sure... > > Besides that, though, I changed my code per your suggestion and tried > > @outputSchema("t:(word:chararray)") > > and still got the error. > > As a note, do I need to import anything in the python script for > outputSchema to work, or should it be fine since pig is grabbing it? > > Once again, I really appreciate your help in the matter. I feel having > people who weren't intimately related to the project have a go at it is how > you make it ultimately more usable and useful...but you have to answer some > annoying questions on the way :P > > Thanks again. > > 2010/12/28 Aniket Mokashi <[EMAIL PROTECTED]> > > I think decorator used here is incorrect. >> In general, "output:chararray" needs to be schema-string-compatible. Also, >> you are using "outputSchemaFunction", which is used in case you want to >> write a udf that has output schema dependent on input schema (ęg -square) >> and this should have a function with decorator "schemaFunction" (named >> "output" in your case). I think using "outputSchema" decorator would fix >> the problem here. >> >> More details can be found at- >> http://wiki.apache.org/pig/UDFsUsingScriptingLanguages >> >> Thanks, >> Aniket >> >> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote: >> > so I have module.py, and I want to be able to use it in a pig script. It >> > has no special imports or anything. I do have >> > @outputSchemaFunction("output:chararray) >> > >> > >> > In my pig script, I have this >> > >> > >> > register '/my/udf/location/udf.py' using jython as myfunc; >> > >> > is there any reason why this wouldn't work? here is the error I get: >> > >> > 2010-12-27 16:29:41,288 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> > ERROR 2998: Unhandled internal error. org/python/util/PythonInterpreter >> > >> > >> > Not the most instructive error, but is there anything more I need to be >> > doing to be able to use a python UDF? >> > >> > As an aside, are simply python UDF's as efficient as Java ones? I like >> > Python a lot and love the idea of being able to UDF in it, but can use >> > java if necessary. >> > >> >> >> > +
Jonathan Coveney 2010-12-28, 03:18
-
Re: Using a UDF written in Pythonsoren@...) 2010-12-29, 17:52
Do you have Jython on your classpath? Currently Jython isn't distributed in
the 0.8.0 release tarball. On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote: > Oh and just to be sure, I have tried > @outputSchema("word:chararray") > @outputSchema("x:{t:(word:chararray)}") > as well (the former of which seems to be the "right" one, whenever I can > figure out what is wrong) > > I've tested my code separately in python and it is fine... > > 2010/12/28 Jonathan Coveney <[EMAIL PROTECTED]> > > > Aniket, I appreciate you taking a look at this. In general, I found the > > documentation around outputSchema pretty confusing... for example, in > this > > example > > > > @outputSchema("x:{t:(word:chararray)}") > > def helloworld(): > > return ('Hello, World') > > > > > > Then, in the sample script below that, you have > > > > @outputSchema("t:(numformat:chararray)") > > def commaFormat(num): > > return '{:,}'.format(num) > > > > In this case, you have lost the x:{} (which makes more sense to me. > > > > Perhaps this is because the latter function is meant to operate on an > input > > and return a type (t), whereas the hello world function should be able to > > stand alone, and thus, has to return a bag? Not sure... > > > > Besides that, though, I changed my code per your suggestion and tried > > > > @outputSchema("t:(word:chararray)") > > > > and still got the error. > > > > As a note, do I need to import anything in the python script for > > outputSchema to work, or should it be fine since pig is grabbing it? > > > > Once again, I really appreciate your help in the matter. I feel having > > people who weren't intimately related to the project have a go at it is > how > > you make it ultimately more usable and useful...but you have to answer > some > > annoying questions on the way :P > > > > Thanks again. > > > > 2010/12/28 Aniket Mokashi <[EMAIL PROTECTED]> > > > > I think decorator used here is incorrect. > >> In general, "output:chararray" needs to be schema-string-compatible. > Also, > >> you are using "outputSchemaFunction", which is used in case you want to > >> write a udf that has output schema dependent on input schema (ęg > -square) > >> and this should have a function with decorator "schemaFunction" (named > >> "output" in your case). I think using "outputSchema" decorator would fix > >> the problem here. > >> > >> More details can be found at- > >> http://wiki.apache.org/pig/UDFsUsingScriptingLanguages > >> > >> Thanks, > >> Aniket > >> > >> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote: > >> > so I have module.py, and I want to be able to use it in a pig script. > It > >> > has no special imports or anything. I do have > >> > @outputSchemaFunction("output:chararray) > >> > > >> > > >> > In my pig script, I have this > >> > > >> > > >> > register '/my/udf/location/udf.py' using jython as myfunc; > >> > > >> > is there any reason why this wouldn't work? here is the error I get: > >> > > >> > 2010-12-27 16:29:41,288 [main] ERROR org.apache.pig.tools.grunt.Grunt > - > >> > ERROR 2998: Unhandled internal error. > org/python/util/PythonInterpreter > >> > > >> > > >> > Not the most instructive error, but is there anything more I need to > be > >> > doing to be able to use a python UDF? > >> > > >> > As an aside, are simply python UDF's as efficient as Java ones? I like > >> > Python a lot and love the idea of being able to UDF in it, but can use > >> > java if necessary. > >> > > >> > >> > >> > > > -- http://about.me/soren/bio +
soren@...) 2010-12-29, 17:52
-
Re: Using a UDF written in PythonJonathan Coveney 2010-12-29, 17:57
I do have Jython installed and on PATH, but maybe I didn't include it in the
right way? Where does it need to be? 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > Do you have Jython on your classpath? Currently Jython isn't distributed in > the 0.8.0 release tarball. > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney <[EMAIL PROTECTED] > >wrote: > > > Oh and just to be sure, I have tried > > @outputSchema("word:chararray") > > @outputSchema("x:{t:(word:chararray)}") > > as well (the former of which seems to be the "right" one, whenever I can > > figure out what is wrong) > > > > I've tested my code separately in python and it is fine... > > > > 2010/12/28 Jonathan Coveney <[EMAIL PROTECTED]> > > > > > Aniket, I appreciate you taking a look at this. In general, I found the > > > documentation around outputSchema pretty confusing... for example, in > > this > > > example > > > > > > @outputSchema("x:{t:(word:chararray)}") > > > def helloworld(): > > > return ('Hello, World') > > > > > > > > > Then, in the sample script below that, you have > > > > > > @outputSchema("t:(numformat:chararray)") > > > def commaFormat(num): > > > return '{:,}'.format(num) > > > > > > In this case, you have lost the x:{} (which makes more sense to me. > > > > > > Perhaps this is because the latter function is meant to operate on an > > input > > > and return a type (t), whereas the hello world function should be able > to > > > stand alone, and thus, has to return a bag? Not sure... > > > > > > Besides that, though, I changed my code per your suggestion and tried > > > > > > @outputSchema("t:(word:chararray)") > > > > > > and still got the error. > > > > > > As a note, do I need to import anything in the python script for > > > outputSchema to work, or should it be fine since pig is grabbing it? > > > > > > Once again, I really appreciate your help in the matter. I feel having > > > people who weren't intimately related to the project have a go at it is > > how > > > you make it ultimately more usable and useful...but you have to answer > > some > > > annoying questions on the way :P > > > > > > Thanks again. > > > > > > 2010/12/28 Aniket Mokashi <[EMAIL PROTECTED]> > > > > > > I think decorator used here is incorrect. > > >> In general, "output:chararray" needs to be schema-string-compatible. > > Also, > > >> you are using "outputSchemaFunction", which is used in case you want > to > > >> write a udf that has output schema dependent on input schema (ęg > > -square) > > >> and this should have a function with decorator "schemaFunction" (named > > >> "output" in your case). I think using "outputSchema" decorator would > fix > > >> the problem here. > > >> > > >> More details can be found at- > > >> http://wiki.apache.org/pig/UDFsUsingScriptingLanguages > > >> > > >> Thanks, > > >> Aniket > > >> > > >> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote: > > >> > so I have module.py, and I want to be able to use it in a pig > script. > > It > > >> > has no special imports or anything. I do have > > >> > @outputSchemaFunction("output:chararray) > > >> > > > >> > > > >> > In my pig script, I have this > > >> > > > >> > > > >> > register '/my/udf/location/udf.py' using jython as myfunc; > > >> > > > >> > is there any reason why this wouldn't work? here is the error I get: > > >> > > > >> > 2010-12-27 16:29:41,288 [main] ERROR > org.apache.pig.tools.grunt.Grunt > > - > > >> > ERROR 2998: Unhandled internal error. > > org/python/util/PythonInterpreter > > >> > > > >> > > > >> > Not the most instructive error, but is there anything more I need to > > be > > >> > doing to be able to use a python UDF? > > >> > > > >> > As an aside, are simply python UDF's as efficient as Java ones? I > like > > >> > Python a lot and love the idea of being able to UDF in it, but can > use > > >> > java if necessary. > > >> > > > >> > > >> > > >> > > > > > > > > > -- > http://about.me/soren/bio > +
Jonathan Coveney 2010-12-29, 17:57
-
Re: Using a UDF written in Pythonsoren@...) 2010-12-29, 18:35
try adding the full path to the jar via PIG_CLASSPATH like so:
export PIG_CLASSPATH=/path/to/jython.jar then run pig. Also, I assume your doing your testing on a local machine? if it's on a cluster, you need to make sure jython is on all the worker nodes and classpath is setup properly on all of them as well. On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote: > I do have Jython installed and on PATH, but maybe I didn't include it in > the > right way? Where does it need to be? > > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > > > Do you have Jython on your classpath? Currently Jython isn't distributed > in > > the 0.8.0 release tarball. > > > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney <[EMAIL PROTECTED] > > >wrote: > > > > > Oh and just to be sure, I have tried > > > @outputSchema("word:chararray") > > > @outputSchema("x:{t:(word:chararray)}") > > > as well (the former of which seems to be the "right" one, whenever I > can > > > figure out what is wrong) > > > > > > I've tested my code separately in python and it is fine... > > > > > > 2010/12/28 Jonathan Coveney <[EMAIL PROTECTED]> > > > > > > > Aniket, I appreciate you taking a look at this. In general, I found > the > > > > documentation around outputSchema pretty confusing... for example, in > > > this > > > > example > > > > > > > > @outputSchema("x:{t:(word:chararray)}") > > > > def helloworld(): > > > > return ('Hello, World') > > > > > > > > > > > > Then, in the sample script below that, you have > > > > > > > > @outputSchema("t:(numformat:chararray)") > > > > def commaFormat(num): > > > > return '{:,}'.format(num) > > > > > > > > In this case, you have lost the x:{} (which makes more sense to me. > > > > > > > > Perhaps this is because the latter function is meant to operate on an > > > input > > > > and return a type (t), whereas the hello world function should be > able > > to > > > > stand alone, and thus, has to return a bag? Not sure... > > > > > > > > Besides that, though, I changed my code per your suggestion and tried > > > > > > > > @outputSchema("t:(word:chararray)") > > > > > > > > and still got the error. > > > > > > > > As a note, do I need to import anything in the python script for > > > > outputSchema to work, or should it be fine since pig is grabbing it? > > > > > > > > Once again, I really appreciate your help in the matter. I feel > having > > > > people who weren't intimately related to the project have a go at it > is > > > how > > > > you make it ultimately more usable and useful...but you have to > answer > > > some > > > > annoying questions on the way :P > > > > > > > > Thanks again. > > > > > > > > 2010/12/28 Aniket Mokashi <[EMAIL PROTECTED]> > > > > > > > > I think decorator used here is incorrect. > > > >> In general, "output:chararray" needs to be schema-string-compatible. > > > Also, > > > >> you are using "outputSchemaFunction", which is used in case you want > > to > > > >> write a udf that has output schema dependent on input schema (��g > > > -square) > > > >> and this should have a function with decorator "schemaFunction" > (named > > > >> "output" in your case). I think using "outputSchema" decorator would > > fix > > > >> the problem here. > > > >> > > > >> More details can be found at- > > > >> http://wiki.apache.org/pig/UDFsUsingScriptingLanguages > > > >> > > > >> Thanks, > > > >> Aniket > > > >> > > > >> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote: > > > >> > so I have module.py, and I want to be able to use it in a pig > > script. > > > It > > > >> > has no special imports or anything. I do have > > > >> > @outputSchemaFunction("output:chararray) > > > >> > > > > >> > > > > >> > In my pig script, I have this > > > >> > > > > >> > > > > >> > register '/my/udf/location/udf.py' using jython as myfunc; > > > >> > > > > >> > is there any reason why this wouldn't work? here is the error I > get: > > > >> > > > > >> > 2010-12-27 16:29:41,288 [main] ERROR > > org.apache.pig.tools.grunt.Grunt http://about.me/soren/bio +
soren@...) 2010-12-29, 18:35
-
Re: Using a UDF written in PythonJonathan Coveney 2010-12-29, 18:57
Ah, that might be it... my computer has it and I have it on my path,
however, I do not know if the cluster has it... definitely something to look into. thanks. 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > try adding the full path to the jar via PIG_CLASSPATH like so: > > export PIG_CLASSPATH=/path/to/jython.jar > > then run pig. Also, I assume your doing your testing on a local machine? if > it's on a cluster, you need to make sure jython is on all the worker nodes > and classpath is setup properly on all of them as well. > > On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney <[EMAIL PROTECTED] > >wrote: > > > I do have Jython installed and on PATH, but maybe I didn't include it in > > the > > right way? Where does it need to be? > > > > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > > > > > Do you have Jython on your classpath? Currently Jython isn't > distributed > > in > > > the 0.8.0 release tarball. > > > > > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney <[EMAIL PROTECTED] > > > >wrote: > > > > > > > Oh and just to be sure, I have tried > > > > @outputSchema("word:chararray") > > > > @outputSchema("x:{t:(word:chararray)}") > > > > as well (the former of which seems to be the "right" one, whenever I > > can > > > > figure out what is wrong) > > > > > > > > I've tested my code separately in python and it is fine... > > > > > > > > 2010/12/28 Jonathan Coveney <[EMAIL PROTECTED]> > > > > > > > > > Aniket, I appreciate you taking a look at this. In general, I found > > the > > > > > documentation around outputSchema pretty confusing... for example, > in > > > > this > > > > > example > > > > > > > > > > @outputSchema("x:{t:(word:chararray)}") > > > > > def helloworld(): > > > > > return ('Hello, World') > > > > > > > > > > > > > > > Then, in the sample script below that, you have > > > > > > > > > > @outputSchema("t:(numformat:chararray)") > > > > > def commaFormat(num): > > > > > return '{:,}'.format(num) > > > > > > > > > > In this case, you have lost the x:{} (which makes more sense to me. > > > > > > > > > > Perhaps this is because the latter function is meant to operate on > an > > > > input > > > > > and return a type (t), whereas the hello world function should be > > able > > > to > > > > > stand alone, and thus, has to return a bag? Not sure... > > > > > > > > > > Besides that, though, I changed my code per your suggestion and > tried > > > > > > > > > > @outputSchema("t:(word:chararray)") > > > > > > > > > > and still got the error. > > > > > > > > > > As a note, do I need to import anything in the python script for > > > > > outputSchema to work, or should it be fine since pig is grabbing > it? > > > > > > > > > > Once again, I really appreciate your help in the matter. I feel > > having > > > > > people who weren't intimately related to the project have a go at > it > > is > > > > how > > > > > you make it ultimately more usable and useful...but you have to > > answer > > > > some > > > > > annoying questions on the way :P > > > > > > > > > > Thanks again. > > > > > > > > > > 2010/12/28 Aniket Mokashi <[EMAIL PROTECTED]> > > > > > > > > > > I think decorator used here is incorrect. > > > > >> In general, "output:chararray" needs to be > schema-string-compatible. > > > > Also, > > > > >> you are using "outputSchemaFunction", which is used in case you > want > > > to > > > > >> write a udf that has output schema dependent on input schema (ęg > > > > -square) > > > > >> and this should have a function with decorator "schemaFunction" > > (named > > > > >> "output" in your case). I think using "outputSchema" decorator > would > > > fix > > > > >> the problem here. > > > > >> > > > > >> More details can be found at- > > > > >> http://wiki.apache.org/pig/UDFsUsingScriptingLanguages > > > > >> > > > > >> Thanks, > > > > >> Aniket > > > > >> > > > > >> On Mon, December 27, 2010 4:30 pm, Jonathan Coveney wrote: > > > > >> > so I have module.py, and I want to be able to use it in a pig > > > script. +
Jonathan Coveney 2010-12-29, 18:57
-
Re: Using a UDF written in PythonJonathan Coveney 2010-12-29, 19:32
Ok, strangely enough, it won't run locally either... it sees the file, but
it's giving me an interpreter not found error, so it must be something else. PIG_CLASSPATH is equal to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython and here is my test script register '/home/jcoveney/udfs/pytest.py' using jython as comp; the_in = LOAD 'input.txt' AS (thing:chararray); the_out = FOREACH the_out GENERATE comp.computation(thing) DUMP theout; but I don't think it's getting that far... it's still giving me the same error. I'm just running it "pig -x local script.pig" 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > Ah, that might be it... my computer has it and I have it on my path, > however, I do not know if the cluster has it... definitely something to look > into. thanks. > > > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > >> try adding the full path to the jar via PIG_CLASSPATH like so: >> >> export PIG_CLASSPATH=/path/to/jython.jar >> >> then run pig. Also, I assume your doing your testing on a local machine? >> if >> it's on a cluster, you need to make sure jython is on all the worker nodes >> and classpath is setup properly on all of them as well. >> >> On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney <[EMAIL PROTECTED] >> >wrote: >> >> > I do have Jython installed and on PATH, but maybe I didn't include it in >> > the >> > right way? Where does it need to be? >> > >> > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> >> > >> > > Do you have Jython on your classpath? Currently Jython isn't >> distributed >> > in >> > > the 0.8.0 release tarball. >> > > >> > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney <[EMAIL PROTECTED] >> > > >wrote: >> > > >> > > > Oh and just to be sure, I have tried >> > > > @outputSchema("word:chararray") >> > > > @outputSchema("x:{t:(word:chararray)}") >> > > > as well (the former of which seems to be the "right" one, whenever I >> > can >> > > > figure out what is wrong) >> > > > >> > > > I've tested my code separately in python and it is fine... >> > > > >> > > > 2010/12/28 Jonathan Coveney <[EMAIL PROTECTED]> >> > > > >> > > > > Aniket, I appreciate you taking a look at this. In general, I >> found >> > the >> > > > > documentation around outputSchema pretty confusing... for example, >> in >> > > > this >> > > > > example >> > > > > >> > > > > @outputSchema("x:{t:(word:chararray)}") >> > > > > def helloworld(): >> > > > > return ('Hello, World') >> > > > > >> > > > > >> > > > > Then, in the sample script below that, you have >> > > > > >> > > > > @outputSchema("t:(numformat:chararray)") >> > > > > def commaFormat(num): >> > > > > return '{:,}'.format(num) >> > > > > >> > > > > In this case, you have lost the x:{} (which makes more sense to >> me. >> > > > > >> > > > > Perhaps this is because the latter function is meant to operate on >> an >> > > > input >> > > > > and return a type (t), whereas the hello world function should be >> > able >> > > to >> > > > > stand alone, and thus, has to return a bag? Not sure... >> > > > > >> > > > > Besides that, though, I changed my code per your suggestion and >> tried >> > > > > >> > > > > @outputSchema("t:(word:chararray)") >> > > > > >> > > > > and still got the error. >> > > > > >> > > > > As a note, do I need to import anything in the python script for >> > > > > outputSchema to work, or should it be fine since pig is grabbing >> it? >> > > > > >> > > > > Once again, I really appreciate your help in the matter. I feel >> > having >> > > > > people who weren't intimately related to the project have a go at >> it >> > is >> > > > how >> > > > > you make it ultimately more usable and useful...but you have to >> > answer >> > > > some >> > > > > annoying questions on the way :P >> > > > > >> > > > > Thanks again. >> > > > > >> > > > > 2010/12/28 Aniket Mokashi <[EMAIL PROTECTED]> >> > > > > >> > > > > I think decorator used here is incorrect. >> > > > >> In general, "output:chararray" needs to be >> schema-string-compatible. +
Jonathan Coveney 2010-12-29, 19:32
-
Re: Using a UDF written in PythonDmitriy Ryaboy 2010-12-29, 21:53
You need to set the classpath to include the literal jar strings, not just
the directory that contains them. Try, /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** D On Wed, Dec 29, 2010 at 11:32 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote: > Ok, strangely enough, it won't run locally either... it sees the file, but > it's giving me an interpreter not found error, so it must be something > else. > > PIG_CLASSPATH is equal > to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython > and here is my test script > > register '/home/jcoveney/udfs/pytest.py' using jython as comp; > > the_in = LOAD 'input.txt' AS (thing:chararray); > the_out = FOREACH the_out GENERATE comp.computation(thing) > DUMP theout; > > but I don't think it's getting that far... it's still giving me the same > error. I'm just running it "pig -x local script.pig" > > > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > > > Ah, that might be it... my computer has it and I have it on my path, > > however, I do not know if the cluster has it... definitely something to > look > > into. thanks. > > > > > > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > > > >> try adding the full path to the jar via PIG_CLASSPATH like so: > >> > >> export PIG_CLASSPATH=/path/to/jython.jar > >> > >> then run pig. Also, I assume your doing your testing on a local machine? > >> if > >> it's on a cluster, you need to make sure jython is on all the worker > nodes > >> and classpath is setup properly on all of them as well. > >> > >> On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney <[EMAIL PROTECTED] > >> >wrote: > >> > >> > I do have Jython installed and on PATH, but maybe I didn't include it > in > >> > the > >> > right way? Where does it need to be? > >> > > >> > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > >> > > >> > > Do you have Jython on your classpath? Currently Jython isn't > >> distributed > >> > in > >> > > the 0.8.0 release tarball. > >> > > > >> > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney < > [EMAIL PROTECTED] > >> > > >wrote: > >> > > > >> > > > Oh and just to be sure, I have tried > >> > > > @outputSchema("word:chararray") > >> > > > @outputSchema("x:{t:(word:chararray)}") > >> > > > as well (the former of which seems to be the "right" one, whenever > I > >> > can > >> > > > figure out what is wrong) > >> > > > > >> > > > I've tested my code separately in python and it is fine... > >> > > > > >> > > > 2010/12/28 Jonathan Coveney <[EMAIL PROTECTED]> > >> > > > > >> > > > > Aniket, I appreciate you taking a look at this. In general, I > >> found > >> > the > >> > > > > documentation around outputSchema pretty confusing... for > example, > >> in > >> > > > this > >> > > > > example > >> > > > > > >> > > > > @outputSchema("x:{t:(word:chararray)}") > >> > > > > def helloworld(): > >> > > > > return ('Hello, World') > >> > > > > > >> > > > > > >> > > > > Then, in the sample script below that, you have > >> > > > > > >> > > > > @outputSchema("t:(numformat:chararray)") > >> > > > > def commaFormat(num): > >> > > > > return '{:,}'.format(num) > >> > > > > > >> > > > > In this case, you have lost the x:{} (which makes more sense to > >> me. > >> > > > > > >> > > > > Perhaps this is because the latter function is meant to operate > on > >> an > >> > > > input > >> > > > > and return a type (t), whereas the hello world function should > be > >> > able > >> > > to > >> > > > > stand alone, and thus, has to return a bag? Not sure... > >> > > > > > >> > > > > Besides that, though, I changed my code per your suggestion and > >> tried > >> > > > > > >> > > > > @outputSchema("t:(word:chararray)") > >> > > > > > >> > > > > and still got the error. > >> > > > > > >> > > > > As a note, do I need to import anything in the python script for > >> > > > > outputSchema to work, or should it be fine since pig is grabbing > >> it? > >> > > > > > >> > > > > Once again, I really appreciate your help in the matter. I feel > >> > having > >> > > > > people who weren't intimately related to the project have a go +
Dmitriy Ryaboy 2010-12-29, 21:53
-
Re: Using a UDF written in PythonJonathan Coveney 2010-12-29, 22:00
echo $PIG_CLASSPATH
/home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** same error 2010-12-29 16:59:29,862 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. Could not initialize class org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter :S I really love that UDF's can be written in python...thanks for helping me try to get there. 2010/12/29 Dmitriy Ryaboy <[EMAIL PROTECTED]> > You need to set the classpath to include the literal jar strings, not just > the directory that contains them. > Try, /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > > D > > On Wed, Dec 29, 2010 at 11:32 AM, Jonathan Coveney <[EMAIL PROTECTED] > >wrote: > > > Ok, strangely enough, it won't run locally either... it sees the file, > but > > it's giving me an interpreter not found error, so it must be something > > else. > > > > PIG_CLASSPATH is equal > > to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython > > and here is my test script > > > > register '/home/jcoveney/udfs/pytest.py' using jython as comp; > > > > the_in = LOAD 'input.txt' AS (thing:chararray); > > the_out = FOREACH the_out GENERATE comp.computation(thing) > > DUMP theout; > > > > but I don't think it's getting that far... it's still giving me the same > > error. I'm just running it "pig -x local script.pig" > > > > > > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > > > > > Ah, that might be it... my computer has it and I have it on my path, > > > however, I do not know if the cluster has it... definitely something to > > look > > > into. thanks. > > > > > > > > > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > > > > > >> try adding the full path to the jar via PIG_CLASSPATH like so: > > >> > > >> export PIG_CLASSPATH=/path/to/jython.jar > > >> > > >> then run pig. Also, I assume your doing your testing on a local > machine? > > >> if > > >> it's on a cluster, you need to make sure jython is on all the worker > > nodes > > >> and classpath is setup properly on all of them as well. > > >> > > >> On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney <[EMAIL PROTECTED] > > >> >wrote: > > >> > > >> > I do have Jython installed and on PATH, but maybe I didn't include > it > > in > > >> > the > > >> > right way? Where does it need to be? > > >> > > > >> > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > > >> > > > >> > > Do you have Jython on your classpath? Currently Jython isn't > > >> distributed > > >> > in > > >> > > the 0.8.0 release tarball. > > >> > > > > >> > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney < > > [EMAIL PROTECTED] > > >> > > >wrote: > > >> > > > > >> > > > Oh and just to be sure, I have tried > > >> > > > @outputSchema("word:chararray") > > >> > > > @outputSchema("x:{t:(word:chararray)}") > > >> > > > as well (the former of which seems to be the "right" one, > whenever > > I > > >> > can > > >> > > > figure out what is wrong) > > >> > > > > > >> > > > I've tested my code separately in python and it is fine... > > >> > > > > > >> > > > 2010/12/28 Jonathan Coveney <[EMAIL PROTECTED]> > > >> > > > > > >> > > > > Aniket, I appreciate you taking a look at this. In general, I > > >> found > > >> > the > > >> > > > > documentation around outputSchema pretty confusing... for > > example, > > >> in > > >> > > > this > > >> > > > > example > > >> > > > > > > >> > > > > @outputSchema("x:{t:(word:chararray)}") > > >> > > > > def helloworld(): > > >> > > > > return ('Hello, World') > > >> > > > > > > >> > > > > > > >> > > > > Then, in the sample script below that, you have > > >> > > > > > > >> > > > > @outputSchema("t:(numformat:chararray)") > > >> > > > > def commaFormat(num): > > >> > > > > return '{:,}'.format(num) > > >> > > > > > > >> > > > > In this case, you have lost the x:{} (which makes more sense > to > > >> me. > > >> > > > > > > >> > > > > Perhaps this is because the latter function is meant to > operate > > on > > >> an > > >> > > > input > > >> > > > > and return a type (t), whereas the hello world function should +
Jonathan Coveney 2010-12-29, 22:00
-
Re: Using a UDF written in PythonJonathan Coveney 2010-12-29, 22:01
Wait, ignore that error, that was the wrong one.
This is it: ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/python/util/PythonInterpreter (I had set the classpath incorrectly, to *.* not ***) 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > echo $PIG_CLASSPATH > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > > same error > > 2010-12-29 16:59:29,862 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 2998: Unhandled internal error. Could not initialize class > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter > > :S > > I really love that UDF's can be written in python...thanks for helping me > try to get there. > > 2010/12/29 Dmitriy Ryaboy <[EMAIL PROTECTED]> > > You need to set the classpath to include the literal jar strings, not just >> the directory that contains them. >> Try, /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** >> >> D >> >> On Wed, Dec 29, 2010 at 11:32 AM, Jonathan Coveney <[EMAIL PROTECTED] >> >wrote: >> >> > Ok, strangely enough, it won't run locally either... it sees the file, >> but >> > it's giving me an interpreter not found error, so it must be something >> > else. >> > >> > PIG_CLASSPATH is equal >> > to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython >> > and here is my test script >> > >> > register '/home/jcoveney/udfs/pytest.py' using jython as comp; >> > >> > the_in = LOAD 'input.txt' AS (thing:chararray); >> > the_out = FOREACH the_out GENERATE comp.computation(thing) >> > DUMP theout; >> > >> > but I don't think it's getting that far... it's still giving me the same >> > error. I'm just running it "pig -x local script.pig" >> > >> > >> > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> >> > >> > > Ah, that might be it... my computer has it and I have it on my path, >> > > however, I do not know if the cluster has it... definitely something >> to >> > look >> > > into. thanks. >> > > >> > > >> > > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> >> > > >> > >> try adding the full path to the jar via PIG_CLASSPATH like so: >> > >> >> > >> export PIG_CLASSPATH=/path/to/jython.jar >> > >> >> > >> then run pig. Also, I assume your doing your testing on a local >> machine? >> > >> if >> > >> it's on a cluster, you need to make sure jython is on all the worker >> > nodes >> > >> and classpath is setup properly on all of them as well. >> > >> >> > >> On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney < >> [EMAIL PROTECTED] >> > >> >wrote: >> > >> >> > >> > I do have Jython installed and on PATH, but maybe I didn't include >> it >> > in >> > >> > the >> > >> > right way? Where does it need to be? >> > >> > >> > >> > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> >> > >> > >> > >> > > Do you have Jython on your classpath? Currently Jython isn't >> > >> distributed >> > >> > in >> > >> > > the 0.8.0 release tarball. >> > >> > > >> > >> > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney < >> > [EMAIL PROTECTED] >> > >> > > >wrote: >> > >> > > >> > >> > > > Oh and just to be sure, I have tried >> > >> > > > @outputSchema("word:chararray") >> > >> > > > @outputSchema("x:{t:(word:chararray)}") >> > >> > > > as well (the former of which seems to be the "right" one, >> whenever >> > I >> > >> > can >> > >> > > > figure out what is wrong) >> > >> > > > >> > >> > > > I've tested my code separately in python and it is fine... >> > >> > > > >> > >> > > > 2010/12/28 Jonathan Coveney <[EMAIL PROTECTED]> >> > >> > > > >> > >> > > > > Aniket, I appreciate you taking a look at this. In general, I >> > >> found >> > >> > the >> > >> > > > > documentation around outputSchema pretty confusing... for >> > example, >> > >> in >> > >> > > > this >> > >> > > > > example >> > >> > > > > >> > >> > > > > @outputSchema("x:{t:(word:chararray)}") >> > >> > > > > def helloworld(): >> > >> > > > > return ('Hello, World') >> > >> > > > > >> > >> > > > > >> > >> > > > > Then, in the sample script below that, you have >> > >> > > > > >> > >> > > > > @outputSchema("t:(numformat:chararray)") +
Jonathan Coveney 2010-12-29, 22:01
-
Re: Using a UDF written in Pythonsoren@...) 2010-12-29, 22:50
I think you took Dmitriy a bit to litterally ;)
you need to put the actual filenames of the jars into PIG_CLASSPATH. If /home/jcoveney/usefulpig/conf:/home/jcoveney/jython is the directory that contains jython.jar (used purely as an example, I'm not certain what the actualy jar name is) then your PIG_CLASSPATH should echo to: /home/jcoveney/jython/jython.jar plus whatever other jars you want to include. 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > Wait, ignore that error, that was the wrong one. > > This is it: > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal > error. org/python/util/PythonInterpreter > > (I had set the classpath incorrectly, to *.* not ***) > > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > > > echo $PIG_CLASSPATH > > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > > > > same error > > > > 2010-12-29 16:59:29,862 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 2998: Unhandled internal error. Could not initialize class > > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter > > > > :S > > > > I really love that UDF's can be written in python...thanks for helping me > > try to get there. > > > > 2010/12/29 Dmitriy Ryaboy <[EMAIL PROTECTED]> > > > > You need to set the classpath to include the literal jar strings, not > just > >> the directory that contains them. > >> Try, /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > >> > >> D > >> > >> On Wed, Dec 29, 2010 at 11:32 AM, Jonathan Coveney <[EMAIL PROTECTED] > >> >wrote: > >> > >> > Ok, strangely enough, it won't run locally either... it sees the file, > >> but > >> > it's giving me an interpreter not found error, so it must be something > >> > else. > >> > > >> > PIG_CLASSPATH is equal > >> > to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython > >> > and here is my test script > >> > > >> > register '/home/jcoveney/udfs/pytest.py' using jython as comp; > >> > > >> > the_in = LOAD 'input.txt' AS (thing:chararray); > >> > the_out = FOREACH the_out GENERATE comp.computation(thing) > >> > DUMP theout; > >> > > >> > but I don't think it's getting that far... it's still giving me the > same > >> > error. I'm just running it "pig -x local script.pig" > >> > > >> > > >> > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > >> > > >> > > Ah, that might be it... my computer has it and I have it on my path, > >> > > however, I do not know if the cluster has it... definitely something > >> to > >> > look > >> > > into. thanks. > >> > > > >> > > > >> > > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > >> > > > >> > >> try adding the full path to the jar via PIG_CLASSPATH like so: > >> > >> > >> > >> export PIG_CLASSPATH=/path/to/jython.jar > >> > >> > >> > >> then run pig. Also, I assume your doing your testing on a local > >> machine? > >> > >> if > >> > >> it's on a cluster, you need to make sure jython is on all the > worker > >> > nodes > >> > >> and classpath is setup properly on all of them as well. > >> > >> > >> > >> On Wed, Dec 29, 2010 at 9:57 AM, Jonathan Coveney < > >> [EMAIL PROTECTED] > >> > >> >wrote: > >> > >> > >> > >> > I do have Jython installed and on PATH, but maybe I didn't > include > >> it > >> > in > >> > >> > the > >> > >> > right way? Where does it need to be? > >> > >> > > >> > >> > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > >> > >> > > >> > >> > > Do you have Jython on your classpath? Currently Jython isn't > >> > >> distributed > >> > >> > in > >> > >> > > the 0.8.0 release tarball. > >> > >> > > > >> > >> > > On Mon, Dec 27, 2010 at 7:18 PM, Jonathan Coveney < > >> > [EMAIL PROTECTED] > >> > >> > > >wrote: > >> > >> > > > >> > >> > > > Oh and just to be sure, I have tried > >> > >> > > > @outputSchema("word:chararray") > >> > >> > > > @outputSchema("x:{t:(word:chararray)}") > >> > >> > > > as well (the former of which seems to be the "right" one, > >> whenever > >> > I > >> > >> > can > >> > >> > > > figure out what is wrong) > >> > >> > > http://about.me/soren/bio +
soren@...) 2010-12-29, 22:50
-
Re: Using a UDF written in PythonJonathan Coveney 2010-12-29, 23:09
Haha gotcha, I am not the greatest at all this package management. I think
we are getting close though... I added jython.jar, as well as my test.py file, and here is what I got when I ran it *sys-package-mgr*: processing new jar, '/home/jcoveney/pig-0.8.0/pig.jar' *sys-package-mgr*: processing new jar, '/home/jcoveney/udfs/test.jar' 2010-12-29 17:56:47,118 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve compress.compressuid using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] (I got the java thing because I ported my UDF to java to see if it would be any easier...) Here is the command I used to run it java -cp $PIGPATH/pig.jar:$PIG_CLASSPATH org.apache.pig.Main -x local udftest.pig $PIG_CLASSPATH /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/jython.jar:/home/jcoveney/udfs/test.jar:/home/jcoveney/udfs/test.py Now, whether I use the python version or the java version, I get an error (well, the first one only applies to the python) init: Bootstrapping class not in Py.BOOTSTRAP_TYPES[class=class org.python.core.PyStringMap] 2010-12-29 18:03:02,967 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve test.test using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] Any ideas? I followed the UDF manual, but perhaps my naming or something is off? I have no idea. Would love any help you can throw at me... 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > I think you took Dmitriy a bit to litterally ;) > > you need to put the actual filenames of the jars into PIG_CLASSPATH. > If /home/jcoveney/usefulpig/conf:/home/jcoveney/jython > is the directory that contains jython.jar (used purely as an example, I'm > not certain what the actualy jar name is) then your PIG_CLASSPATH should > echo to: > > /home/jcoveney/jython/jython.jar > > plus whatever other jars you want to include. > > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > > > Wait, ignore that error, that was the wrong one. > > > > This is it: > > > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal > > error. org/python/util/PythonInterpreter > > > > (I had set the classpath incorrectly, to *.* not ***) > > > > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > > > > > echo $PIG_CLASSPATH > > > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > > > > > > same error > > > > > > 2010-12-29 16:59:29,862 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > > ERROR 2998: Unhandled internal error. Could not initialize class > > > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter > > > > > > :S > > > > > > I really love that UDF's can be written in python...thanks for helping > me > > > try to get there. > > > > > > 2010/12/29 Dmitriy Ryaboy <[EMAIL PROTECTED]> > > > > > > You need to set the classpath to include the literal jar strings, not > > just > > >> the directory that contains them. > > >> Try, /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > > >> > > >> D > > >> > > >> On Wed, Dec 29, 2010 at 11:32 AM, Jonathan Coveney < > [EMAIL PROTECTED] > > >> >wrote: > > >> > > >> > Ok, strangely enough, it won't run locally either... it sees the > file, > > >> but > > >> > it's giving me an interpreter not found error, so it must be > something > > >> > else. > > >> > > > >> > PIG_CLASSPATH is equal > > >> > to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython > > >> > and here is my test script > > >> > > > >> > register '/home/jcoveney/udfs/pytest.py' using jython as comp; > > >> > > > >> > the_in = LOAD 'input.txt' AS (thing:chararray); > > >> > the_out = FOREACH the_out GENERATE comp.computation(thing) > > >> > DUMP theout; > > >> > > > >> > but I don't think it's getting that far... it's still giving me the > > same > > >> > error. I'm just running it "pig -x local script.pig" > > >> > > > >> > > > >> > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > > >> > > > >> > > Ah, that might be it... my computer has it and I have it on my +
Jonathan Coveney 2010-12-29, 23:09
-
Re: Using a UDF written in PythonJonathan Coveney 2010-12-29, 23:12
Also, just in general, does EVERY UDF we want to load have to be added to
the classpath when you call pig? And just the .jar/.py file, or more than that? 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > Haha gotcha, I am not the greatest at all this package management. I think > we are getting close though... I added jython.jar, as well as my test.py > file, and here is what I got when I ran it > > *sys-package-mgr*: processing new jar, '/home/jcoveney/pig-0.8.0/pig.jar' > *sys-package-mgr*: processing new jar, '/home/jcoveney/udfs/test.jar' > 2010-12-29 17:56:47,118 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1070: Could not resolve compress.compressuid using imports: [, > org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > (I got the java thing because I ported my UDF to java to see if it would be > any easier...) > > Here is the command I used to run it > java -cp $PIGPATH/pig.jar:$PIG_CLASSPATH org.apache.pig.Main -x local > udftest.pig > > $PIG_CLASSPATH > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/jython.jar:/home/jcoveney/udfs/test.jar:/home/jcoveney/udfs/test.py > > Now, whether I use the python version or the java version, I get an error > (well, the first one only applies to the python) > > init: Bootstrapping class not in Py.BOOTSTRAP_TYPES[class=class > org.python.core.PyStringMap] > 2010-12-29 18:03:02,967 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1070: Could not resolve test.test using imports: [, > org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > Any ideas? I followed the UDF manual, but perhaps my naming or something is > off? I have no idea. Would love any help you can throw at me... > > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > >> I think you took Dmitriy a bit to litterally ;) >> >> you need to put the actual filenames of the jars into PIG_CLASSPATH. >> If /home/jcoveney/usefulpig/conf:/home/jcoveney/jython >> is the directory that contains jython.jar (used purely as an example, I'm >> not certain what the actualy jar name is) then your PIG_CLASSPATH should >> echo to: >> >> /home/jcoveney/jython/jython.jar >> >> plus whatever other jars you want to include. >> >> 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> >> >> > Wait, ignore that error, that was the wrong one. >> > >> > This is it: >> > >> > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal >> > error. org/python/util/PythonInterpreter >> > >> > (I had set the classpath incorrectly, to *.* not ***) >> > >> > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> >> > >> > > echo $PIG_CLASSPATH >> > > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** >> > > >> > > same error >> > > >> > > 2010-12-29 16:59:29,862 [main] ERROR org.apache.pig.tools.grunt.Grunt >> - >> > > ERROR 2998: Unhandled internal error. Could not initialize class >> > > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter >> > > >> > > :S >> > > >> > > I really love that UDF's can be written in python...thanks for helping >> me >> > > try to get there. >> > > >> > > 2010/12/29 Dmitriy Ryaboy <[EMAIL PROTECTED]> >> > > >> > > You need to set the classpath to include the literal jar strings, not >> > just >> > >> the directory that contains them. >> > >> Try, /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** >> > >> >> > >> D >> > >> >> > >> On Wed, Dec 29, 2010 at 11:32 AM, Jonathan Coveney < >> [EMAIL PROTECTED] >> > >> >wrote: >> > >> >> > >> > Ok, strangely enough, it won't run locally either... it sees the >> file, >> > >> but >> > >> > it's giving me an interpreter not found error, so it must be >> something >> > >> > else. >> > >> > >> > >> > PIG_CLASSPATH is equal >> > >> > to /home/jcoveney/usefulpig/conf:/home/jcoveney/jython >> > >> > and here is my test script >> > >> > >> > >> > register '/home/jcoveney/udfs/pytest.py' using jython as comp; >> > >> > >> > >> > the_in = LOAD 'input.txt' AS (thing:chararray); >> > >> > the_out = FOREACH the_out GENERATE comp.computation(thing) +
Jonathan Coveney 2010-12-29, 23:12
-
Re: Using a UDF written in PythonDmitriy Ryaboy 2010-12-30, 00:37
All the dependencies have to be on the classpath, including the
dependencies' dependencies... D On Wed, Dec 29, 2010 at 3:12 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote: > Also, just in general, does EVERY UDF we want to load have to be added to > the classpath when you call pig? And just the .jar/.py file, or more than > that? > > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > > > Haha gotcha, I am not the greatest at all this package management. I > think > > we are getting close though... I added jython.jar, as well as my test.py > > file, and here is what I got when I ran it > > > > *sys-package-mgr*: processing new jar, '/home/jcoveney/pig-0.8.0/pig.jar' > > *sys-package-mgr*: processing new jar, '/home/jcoveney/udfs/test.jar' > > 2010-12-29 17:56:47,118 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 1070: Could not resolve compress.compressuid using imports: [, > > org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > > > (I got the java thing because I ported my UDF to java to see if it would > be > > any easier...) > > > > Here is the command I used to run it > > java -cp $PIGPATH/pig.jar:$PIG_CLASSPATH org.apache.pig.Main -x local > > udftest.pig > > > > $PIG_CLASSPATH > > > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/jython.jar:/home/jcoveney/udfs/test.jar:/home/jcoveney/udfs/test.py > > > > Now, whether I use the python version or the java version, I get an error > > (well, the first one only applies to the python) > > > > init: Bootstrapping class not in Py.BOOTSTRAP_TYPES[class=class > > org.python.core.PyStringMap] > > 2010-12-29 18:03:02,967 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 1070: Could not resolve test.test using imports: [, > > org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > > > Any ideas? I followed the UDF manual, but perhaps my naming or something > is > > off? I have no idea. Would love any help you can throw at me... > > > > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > > > >> I think you took Dmitriy a bit to litterally ;) > >> > >> you need to put the actual filenames of the jars into PIG_CLASSPATH. > >> If /home/jcoveney/usefulpig/conf:/home/jcoveney/jython > >> is the directory that contains jython.jar (used purely as an example, > I'm > >> not certain what the actualy jar name is) then your PIG_CLASSPATH should > >> echo to: > >> > >> /home/jcoveney/jython/jython.jar > >> > >> plus whatever other jars you want to include. > >> > >> 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > >> > >> > Wait, ignore that error, that was the wrong one. > >> > > >> > This is it: > >> > > >> > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled > internal > >> > error. org/python/util/PythonInterpreter > >> > > >> > (I had set the classpath incorrectly, to *.* not ***) > >> > > >> > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > >> > > >> > > echo $PIG_CLASSPATH > >> > > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > >> > > > >> > > same error > >> > > > >> > > 2010-12-29 16:59:29,862 [main] ERROR > org.apache.pig.tools.grunt.Grunt > >> - > >> > > ERROR 2998: Unhandled internal error. Could not initialize class > >> > > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter > >> > > > >> > > :S > >> > > > >> > > I really love that UDF's can be written in python...thanks for > helping > >> me > >> > > try to get there. > >> > > > >> > > 2010/12/29 Dmitriy Ryaboy <[EMAIL PROTECTED]> > >> > > > >> > > You need to set the classpath to include the literal jar strings, > not > >> > just > >> > >> the directory that contains them. > >> > >> Try, /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > >> > >> > >> > >> D > >> > >> > >> > >> On Wed, Dec 29, 2010 at 11:32 AM, Jonathan Coveney < > >> [EMAIL PROTECTED] > >> > >> >wrote: > >> > >> > >> > >> > Ok, strangely enough, it won't run locally either... it sees the > >> file, > >> > >> but > >> > >> > it's giving me an interpreter not found error, so it must be +
Dmitriy Ryaboy 2010-12-30, 00:37
-
Re: Using a UDF written in PythonJonathan Coveney 2010-12-30, 02:11
Ok, I guess I'm just not used to these sorts of situations where the
dependencies get so hairy 1) for a simply UDF, what dependencies are these that need to be included? 2) Is there a semi-easy way to clean this up? Thanks for your patience. I really am new to the whole dependencies game 2010/12/29 Dmitriy Ryaboy <[EMAIL PROTECTED]> > All the dependencies have to be on the classpath, including the > dependencies' dependencies... > > D > > > On Wed, Dec 29, 2010 at 3:12 PM, Jonathan Coveney <[EMAIL PROTECTED] > >wrote: > > > Also, just in general, does EVERY UDF we want to load have to be added to > > the classpath when you call pig? And just the .jar/.py file, or more than > > that? > > > > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > > > > > Haha gotcha, I am not the greatest at all this package management. I > > think > > > we are getting close though... I added jython.jar, as well as my > test.py > > > file, and here is what I got when I ran it > > > > > > *sys-package-mgr*: processing new jar, > '/home/jcoveney/pig-0.8.0/pig.jar' > > > *sys-package-mgr*: processing new jar, '/home/jcoveney/udfs/test.jar' > > > 2010-12-29 17:56:47,118 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > > ERROR 1070: Could not resolve compress.compressuid using imports: [, > > > org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > > > > > (I got the java thing because I ported my UDF to java to see if it > would > > be > > > any easier...) > > > > > > Here is the command I used to run it > > > java -cp $PIGPATH/pig.jar:$PIG_CLASSPATH org.apache.pig.Main -x local > > > udftest.pig > > > > > > $PIG_CLASSPATH > > > > > > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/jython.jar:/home/jcoveney/udfs/test.jar:/home/jcoveney/udfs/test.py > > > > > > Now, whether I use the python version or the java version, I get an > error > > > (well, the first one only applies to the python) > > > > > > init: Bootstrapping class not in Py.BOOTSTRAP_TYPES[class=class > > > org.python.core.PyStringMap] > > > 2010-12-29 18:03:02,967 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > > ERROR 1070: Could not resolve test.test using imports: [, > > > org.apache.pig.builtin., org.apache.pig.impl.builtin.] > > > > > > Any ideas? I followed the UDF manual, but perhaps my naming or > something > > is > > > off? I have no idea. Would love any help you can throw at me... > > > > > > 2010/12/29 [EMAIL PROTECTED] <[EMAIL PROTECTED]> > > > > > >> I think you took Dmitriy a bit to litterally ;) > > >> > > >> you need to put the actual filenames of the jars into PIG_CLASSPATH. > > >> If /home/jcoveney/usefulpig/conf:/home/jcoveney/jython > > >> is the directory that contains jython.jar (used purely as an example, > > I'm > > >> not certain what the actualy jar name is) then your PIG_CLASSPATH > should > > >> echo to: > > >> > > >> /home/jcoveney/jython/jython.jar > > >> > > >> plus whatever other jars you want to include. > > >> > > >> 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > > >> > > >> > Wait, ignore that error, that was the wrong one. > > >> > > > >> > This is it: > > >> > > > >> > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled > > internal > > >> > error. org/python/util/PythonInterpreter > > >> > > > >> > (I had set the classpath incorrectly, to *.* not ***) > > >> > > > >> > 2010/12/29 Jonathan Coveney <[EMAIL PROTECTED]> > > >> > > > >> > > echo $PIG_CLASSPATH > > >> > > /home/jcoveney/usefulpig/conf:/home/jcoveney/jython/*** > > >> > > > > >> > > same error > > >> > > > > >> > > 2010-12-29 16:59:29,862 [main] ERROR > > org.apache.pig.tools.grunt.Grunt > > >> - > > >> > > ERROR 2998: Unhandled internal error. Could not initialize class > > >> > > org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter > > >> > > > > >> > > :S > > >> > > > > >> > > I really love that UDF's can be written in python...thanks for > > helping > > >> me > > >> > > try to get there. > > >> > > > > >> > > 2010/12/29 Dmitriy Ryaboy <[EMAIL PROTECTED]> +
Jonathan Coveney 2010-12-30, 02:11
|