Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Possible Pig 9.1 globing bug in parameter substitution


Copy link to this message
-
Re: Possible Pig 9.1 globing bug in parameter substitution
Corbin Hoenes 2011-12-28, 19:27
I've located my problem.  It was a difference I believe with the classpath
from 0.9.0 and 0.9.1.  It might be somewhat machine dependent as a lot of
these jars are probably found dynamically via the /bin/pig script which has
changed quite a bit from 0.9.0.  When debugging it looked like
GenericOptionsParser was the culprit so maybe the classpath differences
caused a different version of this class to get loaded.

Anyway the short of it is I have to escape the asterisk * character in my
globbing pattern.

=== Lets do it with 0.9.0 ==
$ /usr/lib/pig-0.9.0/bin/pig -d INFO -p
in_file='/chukwa/repos/Insight-Demo/' -p process_glob='20111226/*/*/*.evt'
-p out_file='dashboard-daily-2011-12-26' -p
in_file1='dashboard-daily-2011-12-26' -p
out_file1='dashboard-daily-2011-12-26' -p current_date_num='20111226' -p
timeperiod='1' ap.pig

*(system.out.println()s added for effect)*
0.9.0 java.class.path  /etc/hbase:/usr/lib/pig-0.9.0/bin/../conf:/usr/java/default/lib/tools.jar:/usr/lib/pig-0.9.0/bin/../build/classes:/usr/lib/pig-0.9.0/bin/../build/test/classes:/usr/lib/pig-0.9.0/bin/../pig-0.9.0-core.jar:/usr/lib/pig-0.9.0/bin/../build/pig-0.9.1-SNAPSHOT.jar:/usr/lib/pig-0.9.0/bin/../lib/automaton.jar:/etc/hadoop/conf:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop/lib/hadoop-lzo-0.4.9.jar

Parameter found: in_file=/chukwa/repos/Insight-Demo/
Parameter found: process_glob=20111226/*/*/*.evt
Parameter found: out_file=dashboard-daily-2011-12-26
Parameter found: in_file1=dashboard-daily-2011-12-26
Parameter found: out_file1=dashboard-daily-2011-12-26
Parameter found: current_date_num=20111226
Parameter found: timeperiod=1

=== now with 0.9.1 ==
$ /usr/lib/pig-0.9.1/bin/pig -d INFO -p
in_file='/chukwa/repos/Insight-Demo/' -p process_glob='20111226/*/*/*.evt'
-p out_file='dashboard-daily-2011-12-26' -p
in_file1='dashboard-daily-2011-12-26' -p
out_file1='dashboard-daily-2011-12-26' -p current_date_num='20111226' -p
timeperiod='1' ap.pig

*(system.out.println()s added for effect)*
0.9.1 java.class.path /usr/lib/hadoop-0.20/conf:/usr/java/default/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/cloudera-desktop-plugins-0.3.0.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-capacity-scheduler-0.20.2-cdh3u0-SNAPSHOT.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u2.jar:/usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.9.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/etc/hbase:/usr/lib/pig-0.9.1/bin/../conf:/usr/java/default/lib/tools.jar:/etc/hadoop/conf:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u2.jar:/usr/lib/hadoop/lib/hadoop-lzo-0.4.9.jar:/usr/lib/pig-0.9.1/bin/../lib/automaton.jar:/usr/lib/pig-0.9.1/bin/../lib/jython-2.5.0.jar:/usr/lib/pig-0.9.1/bin/../pig-withouthadoop.jar::/usr/local/hbase/hbase-0.90.4.jar:/usr/local/hbase/lib/zookeeper-3.3.2.jar:/usr/local/hbase/conf:/usr/local/hbase/hbase-0.90.4.jar:/usr/local/hbase/lib/zookeeper-3.3.2.jar:/usr/local/hbase/conf

Parameter found: in_file=/chukwa/repos/Insight-Demo/
Parameter found: null
Parameter found: out_file=dashboard-daily-2011-12-26
Parameter found: in_file1=dashboard-daily-2011-12-26
Parameter found: out_file1=dashboard-daily-2011-12-26
Parameter found: current_date_num=20111226
Parameter found: timeperiod=1

The 2nd parameter "process_glob" isn't parsed correctly and needs to be
escaped now like this:

/usr/lib/pig-0.9.1/bin/pig -d INFO -p in_file='/chukwa/repos/Insight-Demo/'
*-p process_glob='20111226/\*/\*/\*.evt' *-p
out_file='dashboard-daily-2011-12-26' -p
in_file1='dashboard-daily-2011-12-26' -p
out_file1='dashboard-daily-2011-12-26' -p current_date_num='20111226' -p
timeperiod='1' ap.pig
On Tue, Dec 27, 2011 at 6:15 PM, Aniket Mokashi <[EMAIL PROTECTED]> wrote: