We debated this a bit internally here. I'm not extracting a group per se. I'm parsing a zip code from the end of a postal address field as follows, and it works fine:
REGEX_EXTRACT(address,'[\\d-]+$',0) AS zip
From: Vitalii Tymchyshyn [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, October 09, 2013 6:33 PM
To: [EMAIL PROTECTED]
Subject: Re: Documentation bug in REGEX_EXTRACT
Well, usually for regexp, 0 match is the whole match and groups start from 1. Are you sure you are getting group (the thing in brackets) with 0?
8 жовт. 2013 13:04, користувач "Steve Bernstein" <[EMAIL PROTECTED]>
> Apologies if this is captured elsewhere. In the Pig 0.11.1
> documentation for the builtin function REGEX_EXTRACT (
> http://pig.apache.org/docs/r0.11.1/func.html#regex-extract), the third
> parameter is the index of the matched group to return. The
> documentation says this is a "1-based parameter". That's incorrect-it's zero-based.
> E.g., to get the first match instance I used: