Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # dev >> Review Request: HIVE-4548 Speed up vectorized LIKE filter for special cases abc%, %abc and %abc%


+
Teddy Choi 2013-05-17, 14:36
Copy link to this message
-
Re: Review Request: HIVE-4548 Speed up vectorized LIKE filter for special cases abc%, %abc and %abc%

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11222/#review20736
-----------------------------------------------------------

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterStringColLikeStringScalar.java
<https://reviews.apache.org/r/11222/#comment42827>

    Teddy,
    
    Overall this looks good!
    Because your code determines once per vector what special-case function to call, rather than doing that in the inner loop, I don't think you need to create a templatized version of this. That would not really significantly improve performance.
    
    Please add additional unit tests to test your string pattern classification function and test all the different types of patterns.
    
    Eric

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterStringColLikeStringScalar.java
<https://reviews.apache.org/r/11222/#comment42828>

    Please put a comment to explain what this function does and why it is done that way.

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterStringColLikeStringScalar.java
<https://reviews.apache.org/r/11222/#comment42822>

    comment start // has no comment after it

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterStringColLikeStringScalar.java
<https://reviews.apache.org/r/11222/#comment42826>

    style guide says put blanks before/after = assignment operator.
    
    Please run ant checkstyle.
    
    Overall the style looks good though!
- Eric Hanson
On May 17, 2013, 2:36 p.m., Teddy Choi wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11222/
> -----------------------------------------------------------
>
> (Updated May 17, 2013, 2:36 p.m.)
>
>
> Review request for hive.
>
>
> Description
> -------
>
> I edited FilterStringColLikeStringScala.java as Eric Hanson wrote.
>
> For none-complex patterns, it calls a static method that doesn't call others and uses its given byte arrays only. For complex patterns, it reuses a ByteBuffer and a CharBuffer for decoding UTF-8 to avoid object constructions.
>
> There is 30%~170% performance improvement for all cases. Its benchmark result is on https://issues.apache.org/jira/browse/HIVE-4548#comment-13660750.
>
> It still can be more efficient by using a template-driven approach. I'll apply it soon.
>
>
> This addresses bug HIVE-4548.
>     https://issues.apache.org/jira/browse/HIVE-4548
>
>
> Diffs
> -----
>
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/FilterStringColLikeStringScalar.java 24ba861
>
> Diff: https://reviews.apache.org/r/11222/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Teddy Choi
>
>

+
Teddy Choi 2013-05-21, 12:14
+
Eric Hanson 2013-05-22, 00:00
+
Teddy Choi 2013-05-22, 01:35
+
Teddy Choi 2013-05-28, 01:10
+
Eric Hanson 2013-05-28, 16:31