Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Issues with Hive 'Like" and multiple '%' wildcard in a pattern


Copy link to this message
-
Re: Issues with Hive 'Like" and multiple '%' wildcard in a pattern
yeah. good one.   its a bug alright.     where the trouble starts is when a
'?'  follows a wildcard char.  '%' in this case.  Looks like hive slavishly
mutates a '%' to a '.*' and since you got a '?' after that you get another
valid regex looking like '.*?' ('?" being the minimally match directive)
so it eats your '?' as meta-data and then things just go weird from there.

 in your case #2:

       '%?' (aka '.*?')  matches '?&' in your string
       'icid=main' matches 'icid=main'
       '%dl' matches 'moiz%dl'
       '%' matches nothing but thats ok

so you get a true.

*
*
i don't know of a workaround using the like operator.  i've tried various
number of blackslashes in front of the question mark to no avail to make it
a literal question mark and not a meta question mark. maybe someone else on
this list more knowledgeable on the internals can prescribe a solution.

in the meantime use rlike and a real regexp.  use double backslashes to
escape meta characters.

consider this regexp in hive:  'your_string' rlike  '.*\\?icid=main.*dl.*'
as equivalent to: 'your_string' like '%?icid=main%dl%'

let us know how it turns out.

Thanks again for finding and reporting this.

thanks,
Stephen.
PS  Note that these are only _my_ findings and  as such i can certainly be
wrong so please feel free to verify or disprove. I'm using hive 0.80 which
is kinda old.
On Thu, Jun 13, 2013 at 12:30 PM, Sunderlin, Mark <
[EMAIL PROTECTED]> wrote:

> Hive users: It took me a while to create a standalone use case, but here
> we go:
>
> In Oracle, this SQL returns false for both cases:
>
> SELECT CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case1,
>        CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%dl%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case2
>   FROM DUAL
> case1 case2
> F       F
>
> But in Hive 0.80, I get one False and one True:
>
> SELECT CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case1,
>        CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%dl%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case2
> FROM DUAL - A dummy single row, single column table we installed
> case1   case2
> F       T
>
>
>
> ---
> Mark E. Sunderlin
> Solutions Architect |AOL NETWORKS BDM
> P: 703-265-6935     |C: 540-327-6222 | AIM: MESunderlin
> 22000 AOL Way | Dulles, VA | 20166
>
> From: Stephen Sprague [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, June 12, 2013 2:22 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Issues with Hive 'Like" and multiple '%' wildcard in a pattern
>
> Hi Mark,
> i'm running v0.80 too and multiple '%'s work as expected for me. so. we're
> gonna need a see a definitive test case from you.
> show your full string and show where the like clause fails to match.
> thanks,
> Stephen.
> PS here's my test:
>
> hisql>select city from junk;
> +------------------+
> | city             |
> +------------------+
> | West Haven       |
> | West Haven       |
> | West Haven       |
> | West Haven       |
> | West Jordan      |
> | West Chester     |
> | Westminster      |
> | West Chester     |
> | West Chester     |
> | Westlake         |
> | West Chester     |
> | West Chester     |
> | West Chester     |
> | Weston           |
> | West Covina      |
> | West Chester     |
> | West Chester     |
> | West Hollywood   |
> | West Des Moines  |
> | West Covina      |
> | Westland         |
> | West Valley City |
> | West Valley City |
> | West Valley City |
> | West Valley City |
> | Westminster      |
> | West Memphis     |
> | West Memphis     |
> +------------------+
> 28 affected
>
> hisql>select city from junk where city like '%West%v%';
> +-------------+
> | city        |
> +-------------+
> | West Haven  |
> | West Haven  |
> | West Haven  |
> | West Haven  |
> | West Covina |
> | West Covina |
> +-------------+
> 6 affected
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB