Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Review Request: Add Vectorized Substr


Copy link to this message
-
Re: Review Request: Add Vectorized Substr

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11106/#review20512
-----------------------------------------------------------

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
<https://reviews.apache.org/r/11106/#comment42276>

    please add javadoc comment for purpose of class

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
<https://reviews.apache.org/r/11106/#comment42278>

    put comment for why this is here

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
<https://reviews.apache.org/r/11106/#comment42280>

    explain more clearly what this function does

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
<https://reviews.apache.org/r/11106/#comment42281>

    if you use a negative start index -n, the existing Hive code (non-vectorized) seems to take the tail end total n characters. e.g. substr("foo", -2) is "oo". If you use -n and n is greater than the string length, the output is the empty string. Please handle this case with the same behavior as non vectorized hive.

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
<https://reviews.apache.org/r/11106/#comment42282>

    please run ant checkstyle and follow suggestions, e.g. there is no blank after if before (

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
<https://reviews.apache.org/r/11106/#comment42283>

    also set output value to empty string if output is null
    
    outV.noNulls needs to get set for every case and doesn't get set here

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
<https://reviews.apache.org/r/11106/#comment42289>

    len[0] - offset could be negative.
    
    Do you need to use len[0] - (start[0] - offset)?
    
    Make sure you have unit tests for case where start != 0.

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java
<https://reviews.apache.org/r/11106/#comment42296>

    I've heard that if the common case is the first case, things can run faster. You could reverse these and make the test for offset <> -1.

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java
<https://reviews.apache.org/r/11106/#comment42305>

    need to handle negative start index case. It appears your code could get array out of bounds in that case.
    
    Also, for substrLength <= 0, result should be empty string

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java
<https://reviews.apache.org/r/11106/#comment42308>

    should set isRepeating to false for default case

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java
<https://reviews.apache.org/r/11106/#comment42306>

    need to check noNulls first. If noNulls then you can't look into isNull array or you could see invalid data

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java
<https://reviews.apache.org/r/11106/#comment42309>

    Is this output supposed to be null or empty string?
    
    My add-hoc tests seemed to show it was empty string. You should double check.

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java
<https://reviews.apache.org/r/11106/#comment42310>

    need to set outV.isNull[I] to true or false always, unless you are setting outV.noNulls to true.

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
<https://reviews.apache.org/r/11106/#comment42311>

    second argument to VEctorizedRowBatch constructor should not be use (it defaults to correct value)

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
<https://reviews.apache.org/r/11106/#comment42314>

    argument is not needed -- use default constructor with no args

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
<https://reviews.apache.org/r/11106/#comment42312>

    you need to test for some data with multi-byte characters. There is an example of that someplace else in the tests.

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
<https://reviews.apache.org/r/11106/#comment42315>

    need to try to test the case where data start position is not 0

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorStringExpressions.java
<https://reviews.apache.org/r/11106/#comment42313>

    need to verify the other rows besides 0 are always set to not null. The isNull entries for them could have been true by chance from a previous use of the batch
- Eric Hanson
On May 13, 2013, 9:54 p.m., Timothy Chen wrote: