The expected .out files are created running mvn test on Windows, so the issue is Windows specific not Parquet specific. I'll investigate...

From: Remus Rusanu [mailto:[EMAIL PROTECTED]]

Sent: Monday, February 17, 2014 3:59 PM

To: [EMAIL PROTECTED]

Cc: Brock Noland

Subject: Why do I get statistics diff in EXPLAIN for Parquet?

Looking at the failed Jenkins runs for HIVE-5998, I see there are diffs in the statistics in the EXPLAIN:

Running: diff -a /root/hive/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/vectorized_parquet.q.out /root/hive/itests/qtest/../../ql/src/test/results/clientpositive/vectorized_parquet.q.out

72c72

< Statistics: Num rows: 12288 Data size: 73728 Basic stats: COMPLETE Column stats: NONE

75c75

< Statistics: Num rows: 6144 Data size: 36864 Basic stats: COMPLETE Column stats: NONE

79c79

< Statistics: Num rows: 6144 Data size: 36864 Basic stats: COMPLETE Column stats: NONE

82c82

< Statistics: Num rows: 10 Data size: 60 Basic stats: COMPLETE Column stats: NONE

What would cause such statistics diffs? The Parquet file is created as:

create table if not exists alltypes_parquet (

cint int,

ctinyint tinyint,

csmallint smallint,

cfloat float,

cdouble double,

cstring1 string) stored as parquet;

insert overwrite table alltypes_parquet

select cint,

ctinyint,

csmallint,

cfloat,

cdouble,

cstring1

from alltypesorc;

Note that there are no diffs in the actual query results.

Thanks,

~Remus