Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Review Request 13059: HIVE-4850 Implement vector mode map join


Copy link to this message
-
Re: Review Request 13059: HIVE-4850 Implement vector mode map join

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
-----------------------------------------------------------

(Updated Oct. 3, 2013, 2:17 p.m.)
Review request for hive, Eric Hanson and Jitendra Pandey.
Bugs: HIVE-4850
    https://issues.apache.org/jira/browse/HIVE-4850
Repository: hive-git
Description
-------

This is not the final iteration, but I thought is easier to discuss it with a review.
This implementation works, handles multiple aliases and multiple values per key. The implementation uses the exiting hash tables saved by the local task for the map join, which are row mode hash tables (have row mode keys and store row mode writable object values). Going forward we should avoid the size-of-big-table conversions of big table keys to row-mode and conversion of small table values to vector data. This would require either converting on-the-fly the hash tables to vector friendly ones (when loaded) or changing the local task tahstable sink to create a vectorization friendly hash. First approach may have memory consumption problems (potentially two hash tables end up in memory, would have to stream the transformation or transform as reading from serialized format... nasty).
Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8ab5395
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 9955d09
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 6df3551
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 02ebe14
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java 9e189c9
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java df1c5a6
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b

Diff: https://reviews.apache.org/r/13059/diff/
Testing
-------

Manually run some join queries on alltypes_orc table.
Thanks,

Remus Rusanu