Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Review Request 13059: HIVE-4850 Implement vector mode map join


Copy link to this message
-
Review Request 13059: HIVE-4850 Implement vector mode map join
Remus Rusanu 2013-07-30, 11:11

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
-----------------------------------------------------------

Review request for hive, Eric Hanson and Jitendra Pandey.
Bugs: HIVE-4850
    https://issues.apache.org/jira/browse/HIVE-4850
Repository: hive-git
Description
-------

This is not the final iteration, but I thought is easier to discuss it with a review.
This implementation works, handles multiple aliases and multiple values per key. The implementation uses the exiting hash tables saved by the local task for the map join, which are row mode hash tables (have row mode keys and store row mode writable object values). Going forward we should avoid the size-of-big-table conversions of big table keys to row-mode and conversion of small table values to vector data. This would require either converting on-the-fly the hash tables to vector friendly ones (when loaded) or changing the local task tahstable sink to create a vectorization friendly hash. First approach may have memory consumption problems (potentially two hash tables end up in memory, would have to stream the transformation or transform as reading from serialized format... nasty).
Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 82d4b93
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 31dbf41
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 4da1be8
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 29de38d
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java e579c00
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinDoubleKeys.java d774226
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectKey.java 791bb3f
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java 58a9dc0
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinSingleKey.java 4bff936
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorExecMapper.java 083b9b9
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapOperator.java 41d2001
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 9c90230
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java 9e189c9
  ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableDummyDesc.java f15ce48

Diff: https://reviews.apache.org/r/13059/diff/
Testing
-------

Manually run some join queries on alltypes_orc table.
Thanks,

Remus Rusanu