Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Re: Review Request: HIVE-4595 Add support for string type keys in vectorized GROUP BY

Copy link to this message
Re: Review Request: HIVE-4595 Add support for string type keys in vectorized GROUP BY
Remus Rusanu 2013-05-24, 09:37

This is an automatically generated e-mail. To reply, visit:

(Updated May 24, 2013, 9:37 a.m.)
Review request for hive, Jitendra Pandey, Eric Hanson, and Sarvesh Sakalanaga.

Fix keyHash loop

Extend the VectorHashKeyWrapper and VectorHashKeyWrapperBatch to support ByteColumnVector (ie. string) keys. The addition falls into the existing VectorKeyHashWrapper behavior: the string keys support is 'compiled' once per query into a VectorHashKeyWrapperBatch instance. The VectorHashKeyWrapper is extended to support byte[] key. It stores the key values just like the ByteColumnVector class, by using a byte[][], a start int[] and a lenght int[]. During batch processing ther eis no value copy, the keywrappers take a reference to the data from the batch (ie. they refer the same byte[p] and copy the start/length). This avoids potentially expensive size-of-key copy operations *before* the hash probe. The VectorHashKeyWrapper clonning that occurs when a probe reveleas a missing key in the hash will copy the key (it must) and this is the only time we copy the key values.
This addresses bug HIVE-4595.
Diffs (updated)

  ql/src/java/org/apache/hadoop/hive/ql/exec/VectorHashKeyWrapper.java 35712d0
  ql/src/java/org/apache/hadoop/hive/ql/exec/VectorHashKeyWrapperBatch.java c23614c
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 1ef4955
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/FakeVectorRowBatchFromObjectIterables.java PRE-CREATION
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorGroupByOperator.java b3b5cd2
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorRowBatchFromIterables.java cf3399d
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorRowBatchFromLongIterables.java PRE-CREATION

Diff: https://reviews.apache.org/r/11345/diff/

Extended vectorized GROUP BY unit test to cover String keys for some cases.

Remus Rusanu