Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Trouble with REGEX in PIG


Copy link to this message
-
RE: Trouble with REGEX in PIG
Pradeep,

Does the documentation here need to be updated: http://pig.apache.org/docs/r0.12.0/func.html#regex-extract

It suggests that the function can run against a string and should return the expected value.

I did confirm that I can use REGEX_EXTRACT on values loaded from a file.

Thank you,
Daniel

-----Original Message-----
From: Pradeep Gollakota [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, December 04, 2013 11:28 AM
To: [EMAIL PROTECTED]
Subject: Re: Trouble with REGEX in PIG

It's not valid PigLatin...

The Grunt shell doesn't let you try out functions and UDFs are you're trying to use them.

    A = LOAD 'data' USING PigStorage() as (ip: chararray);
    B = FOREACH A GENERATE REGEX_EXTRACT(ip, '(.*):(.*)', 1);
    DUMP B;

You always have to load a dataset and work with said dataset(s).
You can create a file called 'data' (per the above script) and put "
192.168.1.5:8020" in the file and try the above set of commands in the grunt shell.
On Wed, Dec 4, 2013 at 10:15 AM, Ankit Bhatnagar <[EMAIL PROTECTED]>wrote:

> R u planning to use
>
> org.apache.pig.builtin.REGEX_EXTRACT
>
>
> ?
>
> On 12/4/13 9:28 AM, "Watrous, Daniel" <[EMAIL PROTECTED]> wrote:
>
> >Hi,
> >
> >I'm trying to use regular expressions in PIG, but it's failing. Based
> >on the documentation
> >http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am
> >trying
> >this:
> >
> >[watrous@c0003913 ~]$ pig -x local
> >which: no hadoop in
> >(/opt/krb5/sbin/64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin
> >:/usr
> >/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/sbin:/usr/sbin:/usr/bin:/
> >opt/p
> >b/bin:/opt/perf/bin:/bin:/usr/local/bin:/home/watrous/bin:/home/watro
> >us/pi
> >g-0.12.0/bin)
> >2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Apache Pig
> >version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
> >2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Logging
> >error messages to: /home/watrous/pig_1386177315394.log
> >2013-12-04 17:15:15,425 [main] INFO  org.apache.pig.impl.util.Utils -
> >Default bootup file /home/watrous/.pigbootup not found
> >2013-12-04 17:15:15,599 [main] INFO
> >org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> >Connecting to hadoop file system at: file:///
> >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
> >2013-12-04 17:16:59,753 [main] ERROR org.apache.pig.tools.grunt.Grunt
> >- ERROR 1200: <line 1> Cannot expand macro 'REGEX_EXTRACT'. Reason:
> >Macro must be defined before expansion.
> >Details at logfile: /home/watrous/pig_1386177315394.log
> >
> >Here's the relevant bit from the log file:
> >Pig Stack Trace
> >---------------
> >ERROR 1200: <line 1> Cannot expand macro 'REGEX_EXTRACT'. Reason:
> >Macro must be defined before expansion.
> >
> >Failed to parse: <line 1> Cannot expand macro 'REGEX_EXTRACT'. Reason:
> >Macro must be defined before expansion.
> >        at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:455)
> >        at
> >org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver
> >.java
> >:298)
> >        at
> >org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver
> >.java
> >:287)
> >        at
> >org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180)
> >        at
> >org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648)
> >        at
> >org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621)
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
> >        at
> >org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
> >        at
> >org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScript
> >Parse
> >r.java:501)
> >        at
> >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j
> >ava:1
> >98)
> >        at
> >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j
> >ava:1
> >73)
> >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> >        at org.apache.pig.Main.run(Main.java:541)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB