Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Issues with Hive 'Like" and multiple '%' wildcard in a pattern


+
Sunderlin, Mark 2013-06-12, 13:27
+
Stephen Sprague 2013-06-12, 17:00
+
Sunderlin, Mark 2013-06-12, 17:26
+
Stephen Sprague 2013-06-12, 18:21
+
Sunderlin, Mark 2013-06-13, 19:30
Copy link to this message
-
Re: Issues with Hive 'Like" and multiple '%' wildcard in a pattern
yeah. good one.   its a bug alright.     where the trouble starts is when a
'?'  follows a wildcard char.  '%' in this case.  Looks like hive slavishly
mutates a '%' to a '.*' and since you got a '?' after that you get another
valid regex looking like '.*?' ('?" being the minimally match directive)
so it eats your '?' as meta-data and then things just go weird from there.

 in your case #2:

       '%?' (aka '.*?')  matches '?&' in your string
       'icid=main' matches 'icid=main'
       '%dl' matches 'moiz%dl'
       '%' matches nothing but thats ok

so you get a true.

*
*
i don't know of a workaround using the like operator.  i've tried various
number of blackslashes in front of the question mark to no avail to make it
a literal question mark and not a meta question mark. maybe someone else on
this list more knowledgeable on the internals can prescribe a solution.

in the meantime use rlike and a real regexp.  use double backslashes to
escape meta characters.

consider this regexp in hive:  'your_string' rlike  '.*\\?icid=main.*dl.*'
as equivalent to: 'your_string' like '%?icid=main%dl%'

let us know how it turns out.

Thanks again for finding and reporting this.

thanks,
Stephen.
PS  Note that these are only _my_ findings and  as such i can certainly be
wrong so please feel free to verify or disprove. I'm using hive 0.80 which
is kinda old.
On Thu, Jun 13, 2013 at 12:30 PM, Sunderlin, Mark <
[EMAIL PROTECTED]> wrote:

> Hive users: It took me a while to create a standalone use case, but here
> we go:
>
> In Oracle, this SQL returns false for both cases:
>
> SELECT CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case1,
>        CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%dl%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case2
>   FROM DUAL
> case1 case2
> F       F
>
> But in Hive 0.80, I get one False and one True:
>
> SELECT CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case1,
>        CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%dl%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case2
> FROM DUAL - A dummy single row, single column table we installed
> case1   case2
> F       T
>
>
>
> ---
> Mark E. Sunderlin
> Solutions Architect |AOL NETWORKS BDM
> P: 703-265-6935     |C: 540-327-6222 | AIM: MESunderlin
> 22000 AOL Way | Dulles, VA | 20166
>
> From: Stephen Sprague [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, June 12, 2013 2:22 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Issues with Hive 'Like" and multiple '%' wildcard in a pattern
>
> Hi Mark,
> i'm running v0.80 too and multiple '%'s work as expected for me. so. we're
> gonna need a see a definitive test case from you.
> show your full string and show where the like clause fails to match.
> thanks,
> Stephen.
> PS here's my test:
>
> hisql>select city from junk;
> +------------------+
> | city             |
> +------------------+
> | West Haven       |
> | West Haven       |
> | West Haven       |
> | West Haven       |
> | West Jordan      |
> | West Chester     |
> | Westminster      |
> | West Chester     |
> | West Chester     |
> | Westlake         |
> | West Chester     |
> | West Chester     |
> | West Chester     |
> | Weston           |
> | West Covina      |
> | West Chester     |
> | West Chester     |
> | West Hollywood   |
> | West Des Moines  |
> | West Covina      |
> | Westland         |
> | West Valley City |
> | West Valley City |
> | West Valley City |
> | West Valley City |
> | Westminster      |
> | West Memphis     |
> | West Memphis     |
> +------------------+
> 28 affected
>
> hisql>select city from junk where city like '%West%v%';
> +-------------+
> | city        |
> +-------------+
> | West Haven  |
> | West Haven  |
> | West Haven  |
> | West Haven  |
> | West Covina |
> | West Covina |
> +-------------+
> 6 affected