Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Review Request 13059: HIVE-4850 Implement vector mode map join

Copy link to this message
Re: Review Request 13059: HIVE-4850 Implement vector mode map join
Remus Rusanu 2013-10-03, 14:20

This is an automatically generated e-mail. To reply, visit:

(Updated Oct. 3, 2013, 2:20 p.m.)
Review request for hive, Eric Hanson and Jitendra Pandey.
Bugs: HIVE-4850
Repository: hive-git
Description (updated)

This is a working implementation based on current trunk. It is simpler than the .1 patch in as it delegates the JOIN entirely to the row-mode MapJoinOperator. The vectorized operator is literally calling the row-mode implementaiton for each row in the input batch and collects the row-mode forward into the output batch. This is not as bad as it seems because the JOIN operators has to resort to row-mode operations anyway, due to the small tables (hashtables) being row-mode (objects and object-inspectors). By delegating the entire join logic to the row mode we piggyback on the correctness of exiting implementation. I do plan to come up with a full-vectorized mode implementation but that would require changes to the hash table creation-serialization. Note that the filtering and key evaluation of the big table does use vectorized operators. the row mode applies only to the key HT lookup and to the JOIN logic

  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8ab5395
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 9955d09
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 6df3551
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 02ebe14
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java 9e189c9
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java df1c5a6
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b

Diff: https://reviews.apache.org/r/13059/diff/

Manually run some join queries on alltypes_orc table.

Remus Rusanu