Dongliang Sun 2013-01-23, 05:58
-Re: Error when run python streaming
Thomas Bach 2013-01-23, 14:19
On Wed, Jan 23, 2013 at 01:58:29PM +0800, Dongliang Sun wrote:
> I import a third-party module 'Pandas'.
> It's successful when I directly run the python code.
> Also successful when run the pig script in local mode.
> But has error when run pig script in MapReduce, to debug I comment all of
> the code expect printing out one line.
> Still does not work.
> When I comment the 'import pandas', it works.
Is Pandas installed in a virtual environment? Then the problem is
probably that Pig/Hadoop starts your job in a completely fresh
environment: the Python interpreter is invoked from e.g. /usr/bin and
doesn't know anything about the packages installed in the site
packages path of your virtual environment and fails.
How do you invoke your script? Does it start with a shebang? How is
the script installed? Can you also provide the full trace-back? You
should find it in the error logs of the job.
Can you boil down the code to a minimal example that fails? Both, the
Python and the Pig code?