Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Can't use python UDF in MapReduce mode


+
MiaoMiao 2012-07-20, 07:10
Copy link to this message
-
RE: Can't use python UDF in MapReduce mode
Duckworth, Will 2012-07-20, 11:42
I think you should take a look at this ticket:

https://issues.apache.org/jira/browse/PIG-2761

And this thread:

http://search-hadoop.com/m/gv0122Ls5N11&subj=Deserialization+error+when+using+Jython+UDF+in+Pig+0+10+script

Thanks.

Will Duckworth  Senior Vice President, Software Engineering  | comScore, Inc.(NASDAQ:SCOR)
o +1 (703) 438-2108 | m +1 (301) 606-2977 | mailto:[EMAIL PROTECTED]
.....................................................................................................

Introducing Mobile Metrix 2.0 - The next generation of mobile behavioral measurement
www.comscore.com/MobileMetrix
-----Original Message-----
From: MiaoMiao [mailto:[EMAIL PROTECTED]]
Sent: Friday, July 20, 2012 3:11 AM
To: [EMAIL PROTECTED]
Subject: Can't use python UDF in MapReduce mode

Hi all,

I've been using apache pig to do some ETL work, but ran into a weird problem today when trying pyhon UDFs.

I borrowed an example from
http://sundaycomputing.blogspot.com/2011/01/python-udfs-from-pig-scripts.html

And it worked well in local mode, but not MapReduce mode.

Since my team have already been using pig for quite a while, it's really hard to drop it, so please, if anyone could help.

Here I posted my .py and .pig, and errors coming up.

[Rufus@master1 ~] hadoop fs -cat /hdfs/testudf.txt

Deepak 22 India
Chaitanya 19 India
Sachin 36 India
Barack 50 USA
[Rufus@master1 ~] cat pyudf.py

#!/usr/bin/python
@outputSchema("line:chararray")
def split_into_fields(input_line):
        return input_line
[Rufus@master1 ~] cat pyudf.pig

REGISTER pyudf.py USING jython AS udf;
records = LOAD '/test/testudf.txt' using PigStorage('\n')  AS (input_line:chararray); schema_records = FOREACH records GENERATE udf.split_into_fields(input_line);
DUMP schema_records;
local mode result:
(Deepak 22 India)
(Chaitanya 19 India)
(Sachin 36 India)
(Barack 50 USA)
MapReduce mode result:
2012-07-20 15:09:03,322 [main] INFO  org.apache.pig.Main - Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
2012-07-20 15:09:03,322 [main] INFO  org.apache.pig.Main - Logging error messages to: /root/pig_1342768143321.log
2012-07-20 15:09:03,518 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master1
2012-07-20 15:09:03,568 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master1:54311
2012-07-20 15:09:03,630 [main] INFO
org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
python.cachedir=/tmp/pig_jython_7427830580471090032
*sys-package-mgr*: processing new jar, '/usr/java/jdk1.6.0_29/lib/tools.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/hadoop-core-0.20.2-Intel.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar'
*sys-package-mgr*: processing new jar, '/usr/lib/hadoop-0.20/lib/core-3.1.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/derbyclient.jar'
*sys-package-mgr*: processing new jar, '/usr/lib/hadoop-0.20/lib/derbytools.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-Intel.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/intel-hadoop-lzo-20110718111837.2bd0d5b.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jetty-6.1.26.patched.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.patched.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.patched.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar'
*sys-package-mgr*: processing new jar, '/usr/lib/hadoop-0.20/lib/junit-4.5.jar'
*sys-package-mgr*: processing new jar, '/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar'
*sys-package-mgr*: processing new jar, '/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar'
*s
+
MiaoMiao 2012-07-24, 05:55