Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Review Request 17899: HIVE-5998 Add vectorized reader for Parquet files


Copy link to this message
-
RE: Review Request 17899: HIVE-5998 Add vectorized reader for Parquet files
Hey Jitendra, can you double check the Parquet vectorized record reader is OK with regard to partitioning?

Thanks,
~Remus

From: Brock Noland [mailto:[EMAIL PROTECTED]] On Behalf Of Brock Noland
Sent: Friday, February 14, 2014 7:50 PM
To: Jitendra Pandey; Eric Hanson (BIG DATA); Brock Noland
Cc: Remus Rusanu; hive
Subject: Re: Review Request 17899: HIVE-5998 Add vectorized reader for Parquet files

This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17899/
Seems fine to me! Someone familiar with vectorization should probably do the +1.
- Brock Noland
On February 14th, 2014, 11:07 a.m. UTC, Remus Rusanu wrote:
Review request for hive, Brock Noland, Eric Hanson, and Jitendra Pandey.
By Remus Rusanu.

Updated Feb. 14, 2014, 11:07 a.m.
Bugs: HIVE-5998<https://issues.apache.org/jira/browse/HIVE-5998>
Repository: hive-git
Description

Implementation is straight forward and very simple, but offers all benefits of vectorization possible with a 'shallow' vectorized reader (ie. one that doe not got into parquet-mr project changes). the only complication arrised because of discrepancies between the object inspector seen by the inputformat and the actual output provided by the Parquet readers (eg. OI declares 'byte' primitives but the Parquet reader outputs IntWritable). I had to create a just-in-time VectorColumnAssigner colelciton base don whatever writers the Parquet record reader provides. It is assumed the reader does not change it's output during the iteration.
Testing

Manually tested. New query .q added.
Diffs

  *   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java (d1a75df)
  *   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java (0b504de)
  *   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java (d409d44)
  *   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java (d3412df)
  *   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java (PRE-CREATION)
  *   ql/src/test/queries/clientpositive/vectorized_parquet.q (PRE-CREATION)
  *   ql/src/test/results/clientpositive/vectorized_parquet.q.out (PRE-CREATION)

View Diff<https://reviews.apache.org/r/17899/diff/>