Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Review Request: HIVE-4160 Add support for string type keys in vectorized GROUP BY

Copy link to this message
Review Request: HIVE-4160 Add support for string type keys in vectorized GROUP BY
Remus Rusanu 2013-05-23, 13:28

This is an automatically generated e-mail. To reply, visit:

Review request for hive, Jitendra Pandey, Eric Hanson, and Sarvesh Sakalanaga.

Extend the VectorHashKeyWrapper and VectorHashKeyWrapperBatch to support ByteColumnVector (ie. string) keys. The addition falls into the existing VectorKeyHashWrapper behavior: the string keys support is 'compiled' once per query into a VectorHashKeyWrapperBatch instance. The VectorHashKeyWrapper is extended to support byte[] key. It stores the key values just like the ByteColumnVector class, by using a byte[][], a start int[] and a lenght int[]. During batch processing ther eis no value copy, the keywrappers take a reference to the data from the batch (ie. they refer the same byte[p] and copy the start/length). This avoids potentially expensive size-of-key copy operations *before* the hash probe. The VectorHashKeyWrapper clonning that occurs when a probe reveleas a missing key in the hash will copy the key (it must) and this is the only time we copy the key values.
This addresses bug HIVE-4160.

  ql/src/java/org/apache/hadoop/hive/ql/exec/VectorHashKeyWrapper.java 35712d0
  ql/src/java/org/apache/hadoop/hive/ql/exec/VectorHashKeyWrapperBatch.java c23614c
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 1ef4955
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/FakeVectorRowBatchFromObjectIterables.java PRE-CREATION
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorGroupByOperator.java b3b5cd2
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorRowBatchFromIterables.java cf3399d
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorRowBatchFromLongIterables.java PRE-CREATION

Diff: https://reviews.apache.org/r/11345/diff/

Extended vectorized GROUP BY unit test to cover String keys for some cases.

Remus Rusanu