Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Trouble with REGEX in PIG


Copy link to this message
-
Re: Trouble with REGEX in PIG
It's not valid PigLatin...

The Grunt shell doesn't let you try out functions and UDFs are you're
trying to use them.

    A = LOAD 'data' USING PigStorage() as (ip: chararray);
    B = FOREACH A GENERATE REGEX_EXTRACT(ip, '(.*):(.*)', 1);
    DUMP B;

You always have to load a dataset and work with said dataset(s).
You can create a file called 'data' (per the above script) and put "
192.168.1.5:8020" in the file and try the above set of commands in the
grunt shell.
On Wed, Dec 4, 2013 at 10:15 AM, Ankit Bhatnagar <[EMAIL PROTECTED]>wrote:

> R u planning to use
>
> org.apache.pig.builtin.REGEX_EXTRACT
>
>
> ?
>
> On 12/4/13 9:28 AM, "Watrous, Daniel" <[EMAIL PROTECTED]> wrote:
>
> >Hi,
> >
> >I'm trying to use regular expressions in PIG, but it's failing. Based on
> >the documentation
> >http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am trying
> >this:
> >
> >[watrous@c0003913 ~]$ pig -x local
> >which: no hadoop in
> >(/opt/krb5/sbin/64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr
> >/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/sbin:/usr/sbin:/usr/bin:/opt/p
> >b/bin:/opt/perf/bin:/bin:/usr/local/bin:/home/watrous/bin:/home/watrous/pi
> >g-0.12.0/bin)
> >2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Apache Pig
> >version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
> >2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Logging error
> >messages to: /home/watrous/pig_1386177315394.log
> >2013-12-04 17:15:15,425 [main] INFO  org.apache.pig.impl.util.Utils -
> >Default bootup file /home/watrous/.pigbootup not found
> >2013-12-04 17:15:15,599 [main] INFO
> >org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> >Connecting to hadoop file system at: file:///
> >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
> >2013-12-04 17:16:59,753 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> >ERROR 1200: <line 1> Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro
> >must be defined before expansion.
> >Details at logfile: /home/watrous/pig_1386177315394.log
> >
> >Here's the relevant bit from the log file:
> >Pig Stack Trace
> >---------------
> >ERROR 1200: <line 1> Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro
> >must be defined before expansion.
> >
> >Failed to parse: <line 1> Cannot expand macro 'REGEX_EXTRACT'. Reason:
> >Macro must be defined before expansion.
> >        at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:455)
> >        at
> >org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver.java
> >:298)
> >        at
> >org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java
> >:287)
> >        at
> >org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180)
> >        at
> >org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648)
> >        at
> >org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621)
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
> >        at
> >org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
> >        at
> >org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParse
> >r.java:501)
> >        at
> >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:1
> >98)
> >        at
> >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:1
> >73)
> >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> >        at org.apache.pig.Main.run(Main.java:541)
> >        at org.apache.pig.Main.main(Main.java:156)
> >
> >I attempted to define the macro (following this tutorial
> >http://aws.amazon.com/articles/2729). However, piggybank.jar doesn't
> >define org.apache.pig.piggybank.evaluation.string.EXTRACT, so I located
> >the most likely file in the current version of the jar.
> >
> >grunt> register
> >/home/watrous/pig-0.12.0/contrib/piggybank/java/piggybank.jar
> >grunt> DEFINE REGEX_EXTRACT
> >org.apache.pig.piggybank.evaluation.string.RegexExtract;
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB