Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Error when run python streaming

Copy link to this message
Re: Error when run python streaming
On Wed, Jan 23, 2013 at 01:58:29PM +0800, Dongliang Sun wrote:
> I import a third-party module 'Pandas'.
> It's successful when I directly run the python code.
> Also successful when run the pig script in local mode.
> But has error when run pig script in MapReduce, to debug I comment all of
> the code expect printing out one line.
> Still does not work.
> When I comment the 'import pandas', it works.

Is Pandas installed in a virtual environment? Then the problem is
probably that Pig/Hadoop starts your job in a completely fresh
environment: the Python interpreter is invoked from e.g. /usr/bin and
doesn't know anything about the packages installed in the site
packages path of your virtual environment and fails.

How do you invoke your script? Does it start with a shebang?  How is
the script installed? Can you also provide the full trace-back?  You
should find it in the error logs of the job.

Can you boil down the code to a minimal example that fails? Both, the
Python and the Pig code?

Thomas Bach.