|
|
-
Error when run python streaming
Dongliang Sun 2013-01-23, 05:58
Hi All,
Currently I encounter one problem when I run the python streaming.
I import a third-party module 'Pandas'.
It's successful when I directly run the python code. Also successful when run the pig script in local mode.
But has error when run pig script in MapReduce, to debug I comment all of the code expect printing out one line. Still does not work. When I comment the 'import pandas', it works.
no problem at all when 'import numpy'.
So is there anything I missed in the pig script to run the python-pandas?
Thanks, Dongliang
+
Dongliang Sun 2013-01-23, 05:58
-
Re: Error when run python streaming
Thomas Bach 2013-01-23, 14:19
On Wed, Jan 23, 2013 at 01:58:29PM +0800, Dongliang Sun wrote: > I import a third-party module 'Pandas'. > > It's successful when I directly run the python code. > Also successful when run the pig script in local mode. > > But has error when run pig script in MapReduce, to debug I comment all of > the code expect printing out one line. > Still does not work. > When I comment the 'import pandas', it works.
Is Pandas installed in a virtual environment? Then the problem is probably that Pig/Hadoop starts your job in a completely fresh environment: the Python interpreter is invoked from e.g. /usr/bin and doesn't know anything about the packages installed in the site packages path of your virtual environment and fails.
How do you invoke your script? Does it start with a shebang? How is the script installed? Can you also provide the full trace-back? You should find it in the error logs of the job.
Can you boil down the code to a minimal example that fails? Both, the Python and the Pig code?
Regards, Thomas Bach.
+
Thomas Bach 2013-01-23, 14:19
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext