Neil Kodner 2012-11-01, 14:05
-Re: Regexp character classes clarification
Jan Dolinár 2012-11-01, 14:32
Have you tried to test your regexes in Java? I was using one of the
applets available on the web (e.g.
) to test my expressions before running a hive query and it helped me
Usually all you need is to use double escaping, such as:
select regexp_extract("abc def ghj","\\s(.*)\\s",1) from test limit 1;
This correctly returns a string " def ".
On Thu, Nov 1, 2012 at 3:05 PM, Neil Kodner <[EMAIL PROTECTED]> wrote:
> From the hive docs on regexp_extract:
> Note that some care is necessary in using predefined character classes:
> using '\s' as the second argument will match the letter s; '
> s' is necessary to match whitespace, etc. The 'index' parameter is the Java
> regex Matcher group() method index. See
> docs/api/java/util/regex/Matcher.html for more information on the 'index' or
> Java regex group() method.
> This is confusing, especially the line break after s; '. Can anyone explain
> whether character classes work under regexp_extract?
> I'm asking because I've been having some trouble implementing regular
> expression extracts using character classes such as \w. These regular
> expressions are working in some other environments but I can't get them to
> work correctly in hive.
Neil Kodner 2012-11-01, 19:11