Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Review Request 14221: HIVE-4113: Optimize select count(1) with RCFile and Orc


Copy link to this message
-
Re: Review Request 14221: HIVE-4113: Optimize select count(1) with RCFile and Orc
Ashutosh Chauhan 2013-09-19, 18:49

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14221/#review26274
-----------------------------------------------------------
Few comments, mostly around unnecessary null checks, which I think are no longer required, now that column pruning will always be happening.
Secondly, I think we should be representing column list as LinkedHashSet instead of List.
ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
<https://reviews.apache.org/r/14221/#comment51336>

    Seems like this null check is now redundant. List can never be null.

ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
<https://reviews.apache.org/r/14221/#comment51341>

    If above is true, this else can also be removed.

ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
<https://reviews.apache.org/r/14221/#comment51339>

    Seems like this null check is now redundant. Can we remove this?

ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
<https://reviews.apache.org/r/14221/#comment51340>

    If above is true, than this can also be removed.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51337>

    Seems like this method is called only from tests, no one actually uses it. I will suggest just to remove this method altogether to minimize. appendReadColumnIds() can readily be used in place of this and that is what Hive uses everywhere.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51338>

    This method is also only called either from previous method and from test. Once we remove previous one, I don't think we need to introduce this new method.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51355>

    ids should be of type LinkedHashSet<Integer> instead of list.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51343>

    Seems like this null check is no longer needed.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51344>

    I don't think old can be null at this point either. We should remove this null check.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51356>

    Simlarly here cols should be of type LinkedHashSet<String>

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51345>

    This null check is not required anymore.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51342>

    Seems like there is no way that ids could be null now. Lets remove this null check. If someone is indeed passing null, than we are just masking that bug, which should be fixed at caller site.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51358>

    This should return LinkedHashSet<Integer> instead.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51346>

    Caller should never pass null conf, Returning empty list is dangerous. Better to let it throw NPE on the caller than this.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51347>

    It doesn't seem like that this method is needed.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51348>

    It should be caller's responsibility to not pass in null here. We should not do this null check.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51349>

    Remove call to this new public method and just inline that logic in this method. We should keep our public methods to minimum.

serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java
<https://reviews.apache.org/r/14221/#comment51350>

    Again, no null check please : )
- Ashutosh Chauhan
On Sept. 19, 2013, 5:48 p.m., Yin Huai wrote: