Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # dev >> Review Request: Vectorized Timestamp functions for long nanosecond based timestamps


+
Gopal V 2013-05-29, 23:36
Copy link to this message
-
Re: Review Request: Vectorized Timestamp functions for long nanosecond based timestamps

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11530/#review21203
-----------------------------------------------------------

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44042>

    Sun Java style convention is to have a blank line before all comments. I don't personally mind but that's what I've been asked to do :-).

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44039>

    Please use full sentences starting with caps and ending with period.

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44038>

    Can you confirm this expression will be constant-folded by the compiler? Otherwise this should be evaluated by hand in advance.

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44040>

    If you know in what cases ms can be negative, can you add that to the comment? That seems unusual.
    
    In think this because if the time is negative (before the epoch) you could get negative nanos. So you want to convert before creating the timestamp. Is that right? Please elaborate a little in the comment.

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFTimestampFieldLong.java
<https://reviews.apache.org/r/11530/#comment44041>

    Would it be possible to re-use a timestamp that belongs to the class here, rather than calling new? If so, please do that to speed this up. I think you can do what you need with setTime() and setNanos().
    
    Eliminating new() in the inner loop of vector processing tends to speed things up a lot.

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFWeekOfYearLong.java
<https://reviews.apache.org/r/11530/#comment44044>

    Please explain why constant 4 is correct here and what it means.

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44049>

    Nice work! Good, compact code to initialize year boundaries. No use of new() or calls to heavy calendar methods in the inner loop. Awesome! :-)

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44045>

    can you use minYear and maxYear here instead of literals?

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44047>

    can you use minYear + 1 and maxYear instead of literals?

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44051>

    Given that this function is moderately heavy anyway (with the binary search) I think making it virtual will not slow things down much.
    
    But if it gets faster we should seriously consider creating a separate evaluate method for VectorUDFYearLong and make this a static function to avoid virtual method call overhead.
    

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorUDFYearLong.java
<https://reviews.apache.org/r/11530/#comment44050>

    Did you consider calculating approximate year using something like
    
    approxYear = yearBase + (int)(time / nanosPerYear) - 1;
    
    and then linearly search forward to find year boundary? I wonder if that would be faster than binary search.

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44063>

    Overall this is a good set of unit tests! It's quite comprehensive. Thanks.

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44054>

    Can you comment this function to explain how you are using long[] inputs? I think I understand but a comment would help.
    
    

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44061>

    if inputs.length == 1 (which I think it often does in your tests), then I % 1 is always 0. So you are always loading up input vectors with all 0. Is there a reason for this? If so, please explain, or if not, consider a wider range of inputs.

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44052>

    what does /*begin-macro*/ mean?

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44053>

    I'm trying to standardize on using "batch" instead of "vrg" for batches. vrg is used in a lot of places but g stands for group not batch so it is confusing.

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTimestampExpressions.java
<https://reviews.apache.org/r/11530/#comment44060>

    here you know !(vrg.cols[in].noNulls || vrg.cols[in].isNull[i] == false) ==>
    !noNulls and isNull[I] == true
    
    But you are checking for the value of isNull[I] for "in" in the assert which is a bit misleading.
    
    Maybe test against true in the else branch?
- Eric Hanson
On May 29, 2013, 11:36 p.m., Gopal V wrote:
+
Gopal V 2013-05-31, 13:58
+
Gopal V 2013-05-31, 13:59
+
Eric Hanson 2013-05-31, 18:26