Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Regexp character classes clarification


Copy link to this message
-
Re: Regexp character classes clarification
Jan Dolinár 2012-11-01, 14:32
Hi Neil,

Have you tried to test your regexes in Java? I was using one of the
applets available on the web (e.g.
http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html
) to test my expressions before running a hive query and it helped me
a lot...

Usually all you need is to use double escaping, such as:
    select regexp_extract("abc def ghj","\\s(.*)\\s",1) from test limit 1;
This correctly returns a string " def ".

Best regards,
Jan
On Thu, Nov 1, 2012 at 3:05 PM, Neil Kodner <[EMAIL PROTECTED]> wrote:
> From the hive docs on regexp_extract:
>
> Note that some care is necessary in using predefined character classes:
> using '\s' as the second argument will match the letter s; '
> s' is necessary to match whitespace, etc. The 'index' parameter is the Java
> regex Matcher group() method index. See
> docs/api/java/util/regex/Matcher.html for more information on the 'index' or
> Java regex group() method.
>
> This is confusing, especially the line break after s; '. Can anyone explain
> whether character classes work under regexp_extract?
>
> I'm asking because I've been having some trouble implementing regular
> expression extracts using character classes such as \w. These regular
> expressions are working in some other environments but I can't get them to
> work correctly in hive.