Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Regexp character classes clarification


+
Neil Kodner 2012-11-01, 14:05
Copy link to this message
-
Re: Regexp character classes clarification
Hi Neil,

Have you tried to test your regexes in Java? I was using one of the
applets available on the web (e.g.
http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html
) to test my expressions before running a hive query and it helped me
a lot...

Usually all you need is to use double escaping, such as:
    select regexp_extract("abc def ghj","\\s(.*)\\s",1) from test limit 1;
This correctly returns a string " def ".

Best regards,
Jan
On Thu, Nov 1, 2012 at 3:05 PM, Neil Kodner <[EMAIL PROTECTED]> wrote:
> From the hive docs on regexp_extract:
>
> Note that some care is necessary in using predefined character classes:
> using '\s' as the second argument will match the letter s; '
> s' is necessary to match whitespace, etc. The 'index' parameter is the Java
> regex Matcher group() method index. See
> docs/api/java/util/regex/Matcher.html for more information on the 'index' or
> Java regex group() method.
>
> This is confusing, especially the line break after s; '. Can anyone explain
> whether character classes work under regexp_extract?
>
> I'm asking because I've been having some trouble implementing regular
> expression extracts using character classes such as \w. These regular
> expressions are working in some other environments but I can't get them to
> work correctly in hive.
+
Neil Kodner 2012-11-01, 19:11
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB