Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Documentation bug in REGEX_EXTRACT

Copy link to this message
RE: Documentation bug in REGEX_EXTRACT
We debated this a bit internally here.  I'm not extracting a group per se.  I'm parsing a zip code from the end of a postal address field as follows,  and it works fine:

REGEX_EXTRACT(address,'[\\d-]+$',0) AS zip
-----Original Message-----
From: Vitalii Tymchyshyn [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, October 09, 2013 6:33 PM
Subject: Re: Documentation bug in REGEX_EXTRACT

Well, usually for regexp, 0 match is the whole match and groups start from 1. Are you sure you are getting group (the thing in brackets) with 0?
8 жовт. 2013 13:04, користувач "Steve Bernstein" <[EMAIL PROTECTED]>

> Apologies if this is captured elsewhere.  In the Pig 0.11.1
> documentation for the builtin function REGEX_EXTRACT (
> http://pig.apache.org/docs/r0.11.1/func.html#regex-extract), the third
> parameter is the index of the matched group to return.  The
> documentation says this is a "1-based parameter".  That's incorrect-it's zero-based.
>  E.g., to get the first match instance I used:
> REGEX_EXTRACT(string,'regex',0)