Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Error when run python streaming


+
Dongliang Sun 2013-01-23, 05:58
Copy link to this message
-
Re: Error when run python streaming
On Wed, Jan 23, 2013 at 01:58:29PM +0800, Dongliang Sun wrote:
> I import a third-party module 'Pandas'.
>
> It's successful when I directly run the python code.
> Also successful when run the pig script in local mode.
>
> But has error when run pig script in MapReduce, to debug I comment all of
> the code expect printing out one line.
> Still does not work.
> When I comment the 'import pandas', it works.

Is Pandas installed in a virtual environment? Then the problem is
probably that Pig/Hadoop starts your job in a completely fresh
environment: the Python interpreter is invoked from e.g. /usr/bin and
doesn't know anything about the packages installed in the site
packages path of your virtual environment and fails.

How do you invoke your script? Does it start with a shebang?  How is
the script installed? Can you also provide the full trace-back?  You
should find it in the error logs of the job.

Can you boil down the code to a minimal example that fails? Both, the
Python and the Pig code?

Regards,
Thomas Bach.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB